Get Your Data Together! Algorithms for Managing Data Lakes

1 137
15.2
Опубликовано 8 апреля 2019, 18:12
Data lakes (e.g., enterprise data catalogs and Open Data portals) are data dumps if users cannot find and utilize the data in them. In this talk, I present two problems in massive, dynamic data lakes: 1) searching for joinable tables to discover potential linkages, and 2) joining tables from different sources through auto-generated syntactic transformation on join values. I will also present algorithmic solutions that can be used for data lakes that are large both in the number of tables (millions) and table sizes. The presented work has been published in SIGMOD and VLDB.

See more at microsoft.com/en-us/research/v...
автотехномузыкадетское