Successor Uncertainties: Exploration and Uncertainty in Temporal Difference Learning

1 188

Microsoft Research334 тыс

Следующее

24.05.19 – 92748:23

Physics ∩ ML Workshop (Day 2): Combinatorial Cosmology Plenary

Популярные

57 дней – 68254:47

Pretrainer's Guide to Training Data: Measuring Effects of Age, Domain Coverage, Quality, & Toxicity

81 день – 1 52859:42

Panel Discussion: Beyond Language: The future of multimodal models in healthcare, gaming, and AI

Опубликовано 22 мая 2019, 21:17

Probabilistic Q-learning is a promising approach balancing exploration and exploitation in reinforcement learning.
However, existing implementations have significant limitations: they either fail to incorporate uncertainty about long-term consequences of actions or ignore fundamental dependencies in state-action values implied by the~Bellman equation. These problems result in sub-optimal exploration. As a solution, we develop Successor Uncertainties (SU), a probabilistic Q-learning method free of the aforementioned problems. SU outperforms existing baselines on tabular problems and on the Atari benchmark benchmark suite. Overall, SU is an improved and scalable probabilistic Q-learning method with better properties than its predecessors at no extra cost.

See more at microsoft.com/en-us/research/v...

Свежие видео