Optimal Moment Estimation in Data Streams

588
28
Опубликовано 17 августа 2016, 0:31
We close the problem of understanding the space complexity of pth moment estimation in data streams for 0 p = 2 by giving the first optimal upper and lower bounds. In this problem, a high-dimensional vector receives a long sequence of coordinate-wise updates, and we must output a low relative-error approximation of the pth moment at the end of the stream while keeping a memory footprint exponentially smaller than the vector length. This problem has applications in network traffic monitoring, databases, and simply computing distance measures across massive datasets. We obtain the upper bound by showing an invariance principle for sums of boundedly independent p-stable random variables, which may be of independent interest. Our main ingredient in this proof is the introduction of an approximation theoretic tool we dub 'FT-mollification', which has since found other applications in agnostic learning, pseudorandomness, and approximation theory. We obtain the lower bound by showing a direct sum property for the one way communication complexity of the GapHamming problem. Joint work with Daniel M. Kane (Harvard) and David P. Woodruff (IBM Almaden)
автотехномузыкадетское