Computing Mistake-Exploiting Strategies in Very Large Zero-Sum Games

176
Опубликовано 17 августа 2016, 1:38
Game theory has been influential in many fields, such as economics, psychology, and evolutionary biology. This influence extends to machine learning, as games are embedded in many learning and planning problems, and robust algorithms for solving these problems often coincide with algorithms for computing optimal strategies in games. The key solution concept in a two-player zero-sum game is the minimax strategy, which is a player's optimal strategy under the assumption that the player's opponent is perfectly adversarial and always uses a 'best-response' strategy. One problem with the minimax strategy is that it does not allow for the possibility that the player's opponent might make a mistake, and consequently may fail to fully exploit mistakes by the opponent. A refinement of the minimax strategy that takes maximum advantage of the opponent's mistakes is called the lexicographically optimal strategy, and was first proposed by Melvin Dresher in 1961. In this talk, I will describe a novel algorithm for computing lexicographically optimal strategies in a two-player zero-sum game. Unlike previous methods, our algorithm is efficient even when one player has an very large number of strategies, so long as a best-response oracle is available for that player. To the best of our knowledge, we are the first to describe an efficient algorithm for computing a lexicographically optimal strategy in very large games. Our approach is to repeatedly invoke a no-regret algorithm, along with the best-response oracle, in order to compute a lexicographically optimal strategy over several stages. The connection between no-regret algorithms and game theoretic equilibria has been well-established in the literature, and our algorithm can be viewed as a contribution to this family of results. I will also describe how our algorithm can also be applied to solve challenging problems in reinforcement learning. This is joint work with Rob Schapire.
автотехномузыкадетское