PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Finite-Time Analysis of Multi-armed Bandits Problems with Kullback-Leibler Divergences.
Odalric-Ambrym Maillard, Rémi Munos and Gilles Stoltz
JMLR Workshop and Conference Proceedings Volume Volume 19: COLT 2011, pp. 497-514, 2011.

Abstract

We consider a Kullback-Leibler-based algorithm for the stochastic multi-armed bandit problem in the case of distributions with finite supports (not necessarily known beforehand), whose asymptotic regret matches the lower bound of \cite{Burnetas96}. Our contribution is to provide a finite-time analysis of this algorithm; we get bounds whose main terms are smaller than the ones of previously known algorithms with finite-time analyses (like UCB-type algorithms).

PDF - Requires Adobe Acrobat Reader or other PDF viewer.
EPrint Type:Article
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Learning/Statistics & Optimisation
Theory & Algorithms
ID Code:8762
Deposited By:Odalric-Ambrym Maillard
Deposited On:21 February 2012