PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Minimax policies for adversarial and stochastic bandits
Jean-Yves Audibert and Sébastien Bubeck
COLT'09 2009.


We fill in a long open gap in the characterization of the minimax rate for the multi-armed bandit problem. Concretely, we remove an extraneous logarithmic factor in the previously known upper bound and propose a new family of randomized algorithms based on an implicit normalization, as well as a new analysis. We also consider the stochastic case, and prove that an appropriate modification of the upper confidence bound policy UCB1 (Auer et al., 2002) achieves the distribution-free optimal rate while still having a distribution-dependent rate logarithmic in the number of plays.

PDF - Requires Adobe Acrobat Reader or other PDF viewer.
EPrint Type:Article
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Theory & Algorithms
ID Code:6622
Deposited By:Sébastien Bubeck
Deposited On:08 March 2010