Microsoft Research334 тыс
Опубликовано 20 апреля 2018, 22:29
Deep reinforcement learning (RL) has shown promising results for learning complex sequential decision-making behaviors in various environments. However, most successes have been exclusively in simulation, and results in real-world applications such as robotics are limited, largely due to poor sample efficiency of typical deep RL algorithms. I will introduce temporal difference models (TDMs), an extension of goal-conditioned value functions that enables multi time resolution model-base planning. TDMs generalize traditional predictive models, bridge the gap between model-based and off-policy model-free RL, and show substantial improvements in sample efficiency without introducing asymptotic performance loss.
See more at microsoft.com/en-us/research/v...
See more at microsoft.com/en-us/research/v...
Свежие видео