Microsoft Research334 тыс
Опубликовано 27 июня 2016, 20:08
Learning and decision-making problems often boil down to a balancing act between exploring new possibilities and exploiting the best known one. For more than fifty years, the multi-armed bandit problem has been the predominant theoretical model for investigating these issues. The emergence of the Web as a platform for sequential experimentation at a massive scale is leading to shifts in our understanding of this fundamental problem as we confront new challenges and opportunities. I will present two recent pieces of work addressing these challenges. The first concerns the misalignment of incentives in systems, such as online product reviews and citizen science platforms, that depend on a large population of users to explore a space of options. The second concerns situations in which the learner's actions consume one or more limited-supply resources, as when a ticket seller experiments with prices for an event with limited seating.