Machine Learning Systems for Highly Distributed and Rapidly Growing Data

921
18.1
Опубликовано 10 мая 2019, 22:31
The usability and practicality of machine learning are largely influenced by two critical factors: low latency and low cost. However, achieving low latency and low cost is very challenging when machine learning depends on real-world data that are rapidly growing and highly distributed (e.g., training a face recognition model using pictures stored across many data centers globally).

In this talk, I will present my work on building low-latency and low-cost machine learning systems that enable efficient processing of real-world, large-scale data. I will describe a system-level approach that is inspired by the general characteristics of machine learning algorithms, machine learning model structures, and machine learning training/serving data. In line with this approach, I will first present a system that provides both low-latency and low-cost machine learning serving (inferencing) over large-scale continuously-growing datasets (e.g. videos). Shifting the focus to model training, I will then present a system that makes machine learning training over geo-distributed datasets as fast as training within a single data center. Finally, I will discuss our ongoing efforts to tackle a fundamental and largely overlooked problem: machine learning training over skewed data partitions (e.g., facial images collected by cameras in different countries).

See more at microsoft.com/en-us/research/v...
автотехномузыкадетское