PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Finite-time analysis of multi-armed bandits problems with Kullback-Leibler divergences
Odalric Maillard, Rémi Munos and Gilles Stoltz
COLT 2011 2011.

Abstract

We consider a Kullback-Leibler-based algorithm for the stochastic multi-armed bandit problem in the case of distributions with finite supports (not necessarily known beforehand), whose asymptotic regret matches the lower bound of Burnetas and Katehakis (1996). Our contribution is to provide a finite-time analysis of this algorithm; we get bounds whose main terms are smaller than the ones of previously known algorithms with finite-time analyses (like UCB-type algorithms)

PDF - Requires Adobe Acrobat Reader or other PDF viewer.
EPrint Type:Article
Project Keyword:Project Keyword UNSPECIFIED
Subjects:COMPLACS
ID Code:8986
Deposited By:Rémi Munos
Deposited On:21 February 2012