Data Analytics: Integration and Privacy

188
Следующее
Популярные
Опубликовано 12 августа 2016, 0:28
Data analytics has become an extremely important and challenging problem in disciplines like computer science, biology, and medicine. As massive amounts of data are available for analysis, scalable integration techniques become important. At the same time, new privacy issues arise where one's sensitive information can easily be inferred from the large amounts of data. In my talk, I will first focus on the problem of entity resolution (ER), which identifies database records that refer to the same real world entity. In practice, ER is not a one-time process, but is constantly improved as the data, schema and application are better understood. I will address the problem of keeping the ER result up-to-date when the ER logic 'evolves' frequently. A naive approach that re-runs ER from scratch may not be tolerable for resolving large datasets. I will show when and how we can instead exploit previous 'materialized' ER results to save redundant work with evolved logic. Next, I will introduce my work on managing information leakage where one must try to prevent important bits of information from being resolved by ER in order to gain data privacy. As more of our sensitive data gets exposed to a variety of merchants, health care providers, employers, social sites and so on, there is a higher chance that an adversary can 'connect the dots' and piece together our information, leading to even more loss of privacy. I will explain our information leakage model and propose using disinformation as a tool for containing information leakage.
Случайные видео
219 дней – 1 040 73110:32
This New Smartphone Doesn't Freeze...
автотехномузыкадетское