Policy Optimization as Predictable Online Learning Problems: Imitation Learning and Beyond

1 332
29.6
Следующее
29.11.18 – 6 4572:32
Machine Teaching Overview
Популярные
59 дней – 6722:27
Low latency carbon budget 2023
Опубликовано 28 ноября 2018, 19:10
Efficient policy optimization is fundamental to solving real-world reinforcement learning problems, where agent-environment interactions can be costly. In this talk, I will discuss my recent research toward improving policy optimization efficiency from the perspective of online learning. The use of online learning to analyze policy optimization was pioneered by Ross et al. who proposed to reduce imitation learning to adversarial online learning problems. However, as I will discuss, this reduction actually loses information: the policy optimization problem is not truly adversarial but rather predictable from past information. Based on this observation, I will present conditions for the last-iterate convergence of value aggregation for imitation learning. Furthermore, I will show how one can leverage this predictable information to design better algorithms to speed up imitation learning and reinforcement learning.

View slides and more at microsoft.com/en-us/research/v...
автотехномузыкадетское