On the difficulty of training recurrent and deep neural networks

2 768

36.9

Microsoft Research331 тыс

Следующее

27.07.16 – 6456:14

MSR Talk Series: Sums of squares � characterization and distribution

Популярные

49 дней – 4079:39

Fostering appropriate reliance on AI

301 день – 11 29812:33

Sébastien Bubeck on Phi-2 and the surprising power of small models

Опубликовано 27 июля 2016, 1:21

Deep learning is quickly becoming a popular subject in machine learning. A lot of this success is due to the advances done in how these models are trained. There are however still many unanswered questions left. In this talk I will do a short review of existing approaches focusing on two distinct topics. The first one regards training recurrent neural models and specifically addresses the notorious vanishing and exploding gradient problem introduced in Bengio et al. 1994. I will explore these issues from different perspectives, specifically we will look at the problem analytically, from a geometric perspective and using intuitions from dynamical system theory. These perspectives provide hypotheses for the underlying reasons causing these events which lead to heuristic solutions that seem to work well in practice. A second theme of the talk will be to look at natural gradient as an alternative to stochastic gradient descent for learning. I will describe links between natural gradient and other recently proposed optimization techniques such as Hessian-Free, Krylov Subspace Descent or TONGA. I will talk about the specific properties of natural gradient which should help during training and I will touch on the subject of efficient implementation and practical thumb rules of using the algorithm. I hope to see many of you there.

Свежие видео