Taming the Monster: A Fast and Simple Algorithm for Contextual Bandits

9 032

25.1

Microsoft Research334 тыс

Следующее

08.07.16 – 1341:37:10

History of Women in Computing and Women Leaders in Computing

Популярные

06.12.22 – 21019:28

OpenNetLab: An Open Platform for RL-based Congestion Control for Real-Time Communication

06.12.22 – 32452:32

A Cloud-based Telecommunications Infrastructure: Business Opportunities & Research Challenges

Опубликовано 8 июля 2016, 0:11

IMS-Microsoft Research Workshop: Foundations of Data Science - Taming the Monster: A Fast and Simple Algorithm for Contextual Bandits
We present a new algorithm for the contextual bandit learning problem, where the learner repeatedly takes one of K actions in response to the observed context, and observes the reward only for that chosen action. Our method assumes access to an oracle for solving fully supervised cost-sensitive classification problems and achieves the statistically optimal regret guarantee with only &tilde;O(√KT) oracle calls across all T rounds. By doing so, we obtain the most practical contextual bandit learning algorithm amongst approaches that work for general policy classes. We further conduct a proof-of-concept experiment which demonstrates the excellent computational and prediction performance of (an online variant of) our algorithm relative to several baselines. [Joint work with Daniel Hsu, Satyen Kale, John Langford, Lihong Li and Rob Schapire]

Свежие видео