Generic Entity Resolution

4 691
22.7
Следующее
Популярные
220 дней – 1 5211:07:34
Connectivity is a thing, is THE thing
Опубликовано 6 сентября 2016, 5:44
Entity resolution (ER) is a problem that arises in many information integration scenarios: We have two or more sources containing records on the same set of real-world entities (e.g., customers).  However, there are no unique identifiers that tell us what records from one source correspond to those in the other sources.  Furthermore, the records representing the same entity may have differing information, e.g., one record may have the address misspelled, another record may be missing some fields.  An ER algorithm attempts to identify the matching records from multiple sources (i.e., those corresponding to the same real-world entity), and merges the matching records as best it can.   In this talk I will describe a generic ER approach where the functions for comparing and merging records are black-boxes, invoked on pairs of records.  I will describe a set of important properties that should be satisfied by the black-box functions to enable efficient and deterministic ER algorithms, and I will present an algorithm, Swoosh, that significantly reduces the calls to these functions.  In addition, I will also discuss how ER can be preformed when confidences are associated with the input records and with the match and merge functions.
Свежие видео
11 дней – 1 271 0410:39
Which Snapdragon Laptop Will You Choose?
12 дней – 26 2411:00
Apple Watch Ultra 2 Black Unboxing
15 дней – 2 3640:17
Do you remember...? 🕺
автотехномузыкадетское