Q-learning with Flow-Matching Policies

Microsoft Research356 тыс

Следующее

1 день – 1461:12:40

A non-Markovian approach to diffusion-based sampling

Популярные

85 дней – 6871:27

Trailer: The Shape of Things to Come

358 дней – 24341:48

AI Frontiers: Rethinking intelligence with Ashley Llorens and Ida Momennejad

Опубликовано 26 мая 2026, 12:37

Expressive policies such as diffusion and flow-matching policies have recently driven progress in robotic manipulation because they can model complex action distributions and generalize from just a handful of demonstrations. But most are still trained purely with supervised imitation learning. Optimizing them with off-policy reinforcement learning remains challenging, which limits real-world applicability for tasks that require online self-improvement and adaptations. In this talk, I will discuss approaches for making off-policy RL work with flow-matching policies.

Speaker Bio: Qiyang (Colin) Li is a PhD student at UC Berkeley advised by Prof. Sergey Levine. His research interests include reinforcement learning and robot learning, with a focus on leveraging offline prior experience for online exploration. Before that, he was an undergraduate student at the University of Toronto advised by Prof. Roger Grosse.

Find seminar details and upcoming talks: microsoft.com/en-us/research/e...

Свежие видео