Thompson Sampling in Combinatorial Multi-armed Bandits with Independent Arms

1 338
24.8
Опубликовано 12 декабря 2022, 20:34
2022 Data-driven Optimization Workshop: Thompson Sampling in Combinatorial Multi-armed Bandits with Independent Arms

Speaker: Siwei Wang, Microsoft Research Asia

Existing methods of combinatorial multi-armed bandits mainly focus on the UCB approach. To make the algorithm efficient, they usually use the sum of upper confidence bounds of base arms to represent the upper confidence bound of a super arm. However, when the outcomes of different base arms are independent, this upper confidence bound could be much larger than necessary, which leads to a much higher regret upper bound (in regret minimization problems) or complexity upper bound (in pure exploration problems). To deal with this challenge, we explore the idea of Thompson Sampling (TS) that uses independent random samples instead of the upper confidence bounds, and design TS-based algorithms for both the regret minimization problems and the pure exploration problems. In TS-based algorithms, the sum of independent random samples within a super arm will not exceed its tight upper confidence bound with high probability. Hence it solves the above challenge, and achieves lower regret/complexity upper bounds than existing efficient UCB-based algorithms.
автотехномузыкадетское