Successor Uncertainties: Exploration and Uncertainty in Temporal Difference Learning

1 188
18
Опубликовано 22 мая 2019, 21:17
Probabilistic Q-learning is a promising approach balancing exploration and exploitation in reinforcement learning.
However, existing implementations have significant limitations: they either fail to incorporate uncertainty about long-term consequences of actions or ignore fundamental dependencies in state-action values implied by the~Bellman equation. These problems result in sub-optimal exploration. As a solution, we develop Successor Uncertainties (SU), a probabilistic Q-learning method free of the aforementioned problems. SU outperforms existing baselines on tabular problems and on the Atari benchmark benchmark suite. Overall, SU is an improved and scalable probabilistic Q-learning method with better properties than its predecessors at no extra cost.

See more at microsoft.com/en-us/research/v...
автотехномузыкадетское