A Machine Learning Perspective on Managing Noisy Structured Data

2 054
14
Опубликовано 11 ноября 2019, 22:45
Modern analytics depend on high-effort tasks like data preparation and data cleaning to produce accurate results. This talk describes recent work on making routine data preparation tasks such as data cleaning dramatically easier. I will first introduce a formal probabilistic framework to describe the quality of structured data and demonstrate how this framework allows us to cast data cleaning as a statistical learning and inference problem. I will then show how this connection allows us to obtain formal guarantees on automated data cleaning and describe how it forms the basis of the HoloClean framework, a state-of-the-art ML-based solution for managing noisy structured data. I will close with additional examples of how a statistical learning view on managing noisy data can lead to new solutions to classical database problems such as the discovery of functional dependencies in structured data.

See more at microsoft.com/en-us/research/v...
автотехномузыкадетское