Policy Optimization as Predictable Online Learning Problems: Imitation Learning and Beyond

1 323
29.4
Опубликовано 28 ноября 2018, 19:10
Efficient policy optimization is fundamental to solving real-world reinforcement learning problems, where agent-environment interactions can be costly. In this talk, I will discuss my recent research toward improving policy optimization efficiency from the perspective of online learning. The use of online learning to analyze policy optimization was pioneered by Ross et al. who proposed to reduce imitation learning to adversarial online learning problems. However, as I will discuss, this reduction actually loses information: the policy optimization problem is not truly adversarial but rather predictable from past information. Based on this observation, I will present conditions for the last-iterate convergence of value aggregation for imitation learning. Furthermore, I will show how one can leverage this predictable information to design better algorithms to speed up imitation learning and reinforcement learning.

View slides and more at microsoft.com/en-us/research/v...
Случайные видео
177 дней – 4 1790:16
Puffy Coat | Amazon
30.12.19 – 312 9592:41
You Have More Bones Than You Think
17.09.14 – 20 6754:24
Nokia Lumia 530 - Unboxing (4K)
автотехномузыкадетское