PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Algorithms for infinitely many-armed bandits
Yizao Wang, Jean-Yves Audibert and Rémi Munos
In: NIPS 2008, 8-13 Dec 2008, Vancouver, Canada.

Abstract

We consider multi-armed bandit problems where the number of arms is larger than the possible number of experiments. We make a stochastic assumption on the mean-reward of a new selected arm which characterizes its probability of being a near-optimal arm. Our assumption is weaker than in previous works. We describe algorithms based on upper-confidence-bounds applied to a restricted set of randomly selected arms and provide upper-bounds on the resulting expected regret. We also derive a lower-bound which matches (up to a logarithmic factor) the upper-bound in some cases.

EPrint Type:Conference or Workshop Item (Paper)
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Computational, Information-Theoretic Learning with Statistics
Learning/Statistics & Optimisation
Theory & Algorithms
ID Code:5067
Deposited By:Jean-Yves Audibert
Deposited On:24 March 2009