Policy Optimization as Predictable Online Learning Problems: Imitation Learning and Beyond

1 332

29.6

Microsoft Research336 тыс

Следующее

29.11.18 – 6 4572:32

Machine Teaching Overview

Популярные

59 дней – 6722:27

Low latency carbon budget 2023

104 дня – 71554:47

Pretrainer's Guide to Training Data: Measuring Effects of Age, Domain Coverage, Quality, & Toxicity

Опубликовано 28 ноября 2018, 19:10

Efficient policy optimization is fundamental to solving real-world reinforcement learning problems, where agent-environment interactions can be costly. In this talk, I will discuss my recent research toward improving policy optimization efficiency from the perspective of online learning. The use of online learning to analyze policy optimization was pioneered by Ross et al. who proposed to reduce imitation learning to adversarial online learning problems. However, as I will discuss, this reduction actually loses information: the policy optimization problem is not truly adversarial but rather predictable from past information. Based on this observation, I will present conditions for the last-iterate convergence of value aggregation for imitation learning. Furthermore, I will show how one can leverage this predictable information to design better algorithms to speed up imitation learning and reinforcement learning.

View slides and more at microsoft.com/en-us/research/v...

Свежие видео