Differentially Private Learning on Large, Online and High-dimensional Data

457
76.2
Следующее
Популярные
157 дней – 6113:07:51
AI For All: Embracing Equity for All
Опубликовано 28 июля 2016, 0:41
In this talk I will focus on two major aspects of differentially private learning: i) learning from high-dimensional data, and ii) learning from data sets where the samples arrive online. To that end I will discuss about two recent works: i) Differentially Private Convex Empirical Risk Minimization and High-dimensional Regression (joint work with Daniel Kifer and Adam Smith from Pennsylvania State University): Main Idea: We consider differentially private algorithms for convex empirical risk minimization in both low-dimensional and high-dimensional settings. Differential privacy is a recently introduced notion of privacy which guarantees that an algorithm�s output does not depend on the data of any individual record in the dataset. This notion relates very closely to the notion of algorithmic stability in learning theory. In our work: a) We significantly extend the analysis of the ``objective perturbation" algorithm of Chaudhuri et al. (2011) for convex ERM problems. We show that their method can be modified to use less noise (be more accurate), and can be made to apply to problems with hard constraints and non-differentiable regularizers. Additionally, we also give a tighter, data-dependent analysis of the error introduced by their method. A key tool in our analysis is a new nontrivial limit theorem for differential privacy which is of independent interest: if a sequence of differentially private algorithms converges, in a weak sense, then the limit algorithm is also differentially private. b) We give the first two private algorithms for sparse regression problems in high-dimensional settings, where the dimensionality of the problem p is much larger than the sample size n. We analyze their performance for linear regression: under standard assumptions on the data, our algorithms have sample complexity n = poly(s; log p) when there exists a good underlying parameter vector with support at most s. ii) Differentially Private Online Learning (joint work with Prateek Jain from MSR India and Pravesh Kothari from University of Texas at Austin) Main Idea: We consider the problem of preserving privacy in the context of online learning. Online learning involves learning from data in real-time, due to which the learned model as well as its predictions are continuously changing. We study the problem in the framework of online convex programming (OCP)---a popular online learning setting with several theoretical and practical implications---while using differential privacy as the formal measure of privacy. For this problem, we provide a generic framework that can be used to convert any given OCP algorithm into a private OCP algorithm with provable privacy as well as regret guarantees (utility), provided that the given OCP algorithm satisfies the following two criteria: 1) linearly decreasing sensitivity, i.e., the effect of the new data points on the learned model decreases linearly, 2) sub-linear regret. We instantiate our framework with two commonly used OCP algorithms: i) Generalized Infinitesimal Gradient Ascent (GIGA) and ii) Implicit Gradient Descent (IGD).
автотехномузыкадетское