PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Anytime many-armed bandits
Olivier Teytaud, Sylvain Gelly and Michele Sebag
In: CAP 2007, Grenoble(2007).

Abstract

This paper introduces the many-armed bandit problem (ManAB), where the num- ber of arms is large comparatively to the relevant number of time steps. While the ManAB framework is relevant to many real-world applications, the state of the art does not offer anytime algorithms handling ManAB problems. Both theory and practice suggest that two problem categories must be distinguished; the easy cat- egory includes those problems where good arms have reward probability close to 1; the difficult category includes other problems. Two algorithms termed FAIL- URE and MUCBT are proposed for the ManAB framework. FAILURE and its variants extend the non-anytime approach proposed for the denumerable-armed bandit and non-asymptotic bounds are shown; it works very efficiently for easy ManAB problems. Meanwhile, MUCBT efficiently deals with difficult ManAB problems.

PDF - Requires Adobe Acrobat Reader or other PDF viewer.
EPrint Type:Conference or Workshop Item (Paper)
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Computational, Information-Theoretic Learning with Statistics
ID Code:3203
Deposited By:Olivier Teytaud
Deposited On:20 January 2008