Explore or Exploit? Reflections on an Ancient Dilemma in the Age of the Web

743

49.5

Microsoft Research334 тыс

Следующее

27.06.16 – 1859:34

Consistency and Reliability in Cloud-backed Storage Systems

Популярные

301 день – 2 6736:21

Improving Reasoning in Language Models with LASER: Layer-Selective Rank Reduction

358 дней – 3 71220:57

AI Forum 2023 | The Small Models Revolution

Опубликовано 27 июня 2016, 20:08

Learning and decision-making problems often boil down to a balancing act between exploring new possibilities and exploiting the best known one. For more than fifty years, the multi-armed bandit problem has been the predominant theoretical model for investigating these issues. The emergence of the Web as a platform for sequential experimentation at a massive scale is leading to shifts in our understanding of this fundamental problem as we confront new challenges and opportunities. I will present two recent pieces of work addressing these challenges. The first concerns the misalignment of incentives in systems, such as online product reviews and citizen science platforms, that depend on a large population of users to explore a space of options. The second concerns situations in which the learner's actions consume one or more limited-supply resources, as when a ticket seller experiments with prices for an event with limited seating.

Свежие видео