Sparse Stochastic Bandits

643
26.8
Опубликовано 21 июня 2016, 20:48
Many learning problems, such as choosing which ad to put on a website, or what movie to recommend to a user, can be cast as stochastic bandit problems. The key here is that in these problems learning and decision making are interleaved: The actions chosen influence the information received and hence the learning strategy must carefully balance between choosing actions to collect information or to accumulate reward. Today's bandit problems involve tens of thousands or even more actions. A popular approach to scaling up bandit problems to work with large action spaces is to assume that the expected payoff can be closely approximated as a linear map from the features of actions. Using more features helps one to improve this approximation, but when the number of features is also large (maybe also tens of thousands), learning will be slow. A popular assumption then is that the true unknown parameter vector is sparse. In supervised learning, such an assumption is known to help to increase the learning speed. Can sparsity of the parameter vector be exploited in bandit problems? How can one design algorithms that take advantage of parameter sparsity? In this talk we will show an approach, which is based on two reductions between different learning problems, that helps one to answer these questions (in particular, answers the first question positively). The first reduction concerns reducing the sparse stochastic bandit problem to designing (tight) confidence sets for sparse linear regression (and is more or less standard), while the second one concerns reducing the problem of designing confidence sets for sparse linear regression to designing low-regret algorithms for linear prediction under the squared loss in an online, *adversarial* framework. We demonstrate the effectiveness of the approach on some numerical examples. The talk is based on the paper Abbasi-Yadkori, Y., Pál, D., and Szepesvári, Cs., Online-to-confidence-set conversions and application to sparse stochastic bandits, AISTAT, pp. 1—9, 2012. ualberta.ca/~szepesva/papers/o... with some new ideas, time permitting. Only general knowledge of machine learning is assumed.
автотехномузыкадетское