Counterfactual Multi-Agent Policy Gradients

8 987

21.4

Microsoft Research335 тыс

Следующее

28.08.17 – 2 53717:02

The Malmo Collaborative AI Challenge

Популярные

14.12.22 – 5191:36:23

SITI 2022 - Panel Discussion and moderated Q&A session

12.12.22 – 1 67830:07

End-to-end Reinforcement Learning for the Large-scale Traveling Salesman Problem

Опубликовано 28 августа 2017, 19:08

Many real-world problems, such as network packet routing and the coordination of autonomous vehicles, are naturally modelled as cooperative multi-agent systems. In this talk, I overview some of the key challenges in developing reinforcement learning methods that can efficiently learn decentralised policies for such systems. I also propose a new multi-agent actor-critic method called counterfactual multi-agent (COMA) policy gradients. COMA uses a centralised critic to estimate the Q-function and decentralised actors to optimise the agents’ policies. In addition, to address the challenges of multi-agent credit assignment, it uses a counterfactual baseline that marginalises out a single agent’s action, while keeping the other agents’ actions fixed. COMA also uses a critic representation that allows the counterfactual baseline to be computed efficiently in a single forward pass. Finally, I present results evaluating COMA in the testbed of StarCraft unit micromanagement.

See more on this video at microsoft.com/en-us/research/v...

Свежие видео