Microsoft Research335 тыс
Опубликовано 13 марта 2018, 15:38
Machine learning has become one of the most exciting research areas in the world, with various applications. However, there exists a noticeable gap between theory and practice. On one hand, simple algorithms like stochastic gradient descent (SGD) works very well in practice, without satisfactory theoretical explanations. On the other hand, the algorithms from the theory community, although with solid guarantees, tend to be less efficient compared with the techniques widely used in practice, which are usually hand tuned or ad hoc based on intuition.
In this talk, I would like to discuss my effort to bridge theory and practice from two directions. The first direction is “practice to theory”, i.e., to explain and analyze the existing algorithms and empirical observations in machine learning. I will first briefly talk about how SGD escapes saddle points, and then present a two-phase convergence analysis of SGD for the two-layer neural network with ReLU activation.
The other direction is “theory to practice”, i.e., using deep theory tools to obtain new, better and practical algorithms. Along this direction, I will introduce our new algorithm Harmonica that uses Fourier analysis and compressed sensing for tuning hyperparameters. Harmonica supports parallel sampling and works well for tuning neural networks with 30+ hyperparameters.
See more at microsoft.com/en-us/research/v...
In this talk, I would like to discuss my effort to bridge theory and practice from two directions. The first direction is “practice to theory”, i.e., to explain and analyze the existing algorithms and empirical observations in machine learning. I will first briefly talk about how SGD escapes saddle points, and then present a two-phase convergence analysis of SGD for the two-layer neural network with ReLU activation.
The other direction is “theory to practice”, i.e., using deep theory tools to obtain new, better and practical algorithms. Along this direction, I will introduce our new algorithm Harmonica that uses Fourier analysis and compressed sensing for tuning hyperparameters. Harmonica supports parallel sampling and works well for tuning neural networks with 30+ hyperparameters.
See more at microsoft.com/en-us/research/v...
Случайные видео