Research talk: Breaking the deadly triad with a target network

571

38.1

Microsoft Research334 тыс

Следующее

27.01.22 – 7 0816:07

Microsoft Soundscape - overview of Routes feature

Популярные

250 дней – 58046:03

Strategic Subset Selection in Satellite Imagery: Machine Vision Insights

355 дней – 83210:32

AI Forum 2023 | Opening Remarks

Опубликовано 25 января 2022, 1:40

Speaker: Shangtong Zhang, PhD Student, Oxford University

The deadly triad refers to the instability of an off-policy reinforcement learning (RL) algorithm when it employs function approximation and bootstrapping simultaneously, and this is a major challenge in off-policy RL. Join PhD student Shangtong Zhang, from the WhiRL group at the University of Oxford, to learn how the target network can be used as a tool for theoretically breaking the deadly triad. Together, you'll explore how to theoretically understand the conventional wisdom that a target network stabilizes training, a novel target network update rule that augments the commonly used Polyak-averaging style update with two projections, and how a target network can be used in linear off-policy RL algorithms, in both prediction and control settings, as well as both discounted and average-reward Markov decision processes.

Learn more about the 2021 Microsoft Research Summit: Aka.ms/researchsummit

Свежие видео