Microsoft Research334 тыс
Опубликовано 29 мая 2019, 20:38
Differential privacy is considered a de facto standard for private data analysis. However, the definition and much of the supporting literature applies to flat tables. While there exist variants of the definition and specialized algorithms for specific types of relational data (e.g. graphs), there isn't a general privacy definition for multi-relational schemas with constraints, and no system that permits accurate differentially private answering of SQL queries while imposing a fixed privacy budget across all queries posed by the analyst.
This work presents PrivSQL, a first-of-its-kind end-to-end differentially private relational database system. PrivSQL allows an analyst to query data stored in a standard database management system using a rich class of SQL counting queries. PrivSQL adopts a novel generalization of differential privacy to multi-relational data that takes into account constraints in the schema like foreign keys, and allows the data owner to flexibly specify entities in the schema that need privacy. Unlike prior work that only bounds the privacy loss for each query posed by the analyst, PrivSQL ensures a fixed privacy loss across all the queries posed by the analyst. PrivSQL achieves this by answering queries on private synopses generated from several views over the base relation that are tuned to have low error on a representative query workload. We experimentally evaluate PrivSQL on a real-world dataset and a workload of more than 3,600 queries. We show that for 50% of the queries PrivSQL offers at least 1,000 times better error rates than solutions adapted from prior work.
See more at microsoft.com/en-us/research/v...
This work presents PrivSQL, a first-of-its-kind end-to-end differentially private relational database system. PrivSQL allows an analyst to query data stored in a standard database management system using a rich class of SQL counting queries. PrivSQL adopts a novel generalization of differential privacy to multi-relational data that takes into account constraints in the schema like foreign keys, and allows the data owner to flexibly specify entities in the schema that need privacy. Unlike prior work that only bounds the privacy loss for each query posed by the analyst, PrivSQL ensures a fixed privacy loss across all the queries posed by the analyst. PrivSQL achieves this by answering queries on private synopses generated from several views over the base relation that are tuned to have low error on a representative query workload. We experimentally evaluate PrivSQL on a real-world dataset and a workload of more than 3,600 queries. We show that for 50% of the queries PrivSQL offers at least 1,000 times better error rates than solutions adapted from prior work.
See more at microsoft.com/en-us/research/v...
Свежие видео