On The Algorithmic and System Interface of BIG LEARNING

363
Опубликовано 27 июня 2016, 18:12
In many modern applications built on massive data and using high-dimensional models, such as web-scale content extraction via topic models, genome-wide association mapping via sparse regression, and image understanding via deep neural network, one needs to handle BIG machine learning problems that threaten to exceed the limit of current infrastructures and algorithms. While ML community continues to strive for new scalable algorithms, and several attempts on developing new system architectures for BIG ML have emerged to address the challenge on the backend, good dialogs between ML and system remain difficult — most algorithmic research remain disconnected from the real system/data they are to face; and the generality, programmability, and theoretical guarantee of most systems on ML programs remain largely unclear. In this talk, I will present Petuum – a general-purpose framework for distributed machine learning, and demonstrate how innovations in scalable algorithms and distributed systems design work in concert to achieve multiple orders of magnitude of scalability on a modest cluster for a wide range of large scale problems in social network (mixed-membership inference on 100M node), personalized genome medicine (sparse regression on 100M dimensions), and computer vision (classification over 20K labels), with provable guarantee on correctness of distributed inference.
автотехномузыкадетское