PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

UCB Revisited: Improved Regret Bounds for the Stochastic Multi-Armed Bandit Problem
Peter Auer and Ronald Ortner
Periodica Mathematica Hungarica Volume 61, Number 1-2, pp. 55-65, 2010. ISSN 0031-5303


In the stochastic multi-armed bandit problem we consider a modification of the UCB algorithm of Auer et al. [4]. For this modified algorithm we give an improved bound on the regret with respect to the optimal reward. While for the original UCB algorithm the regret in K-armed bandits after T trials is bounded by const K log(T)/Delta, where Delta measures the distance between a suboptimal arm and the optimal arm, for the modified UCB algorithm we show an upper bound on the regret of const K log (T/Delta^2) / Delta.

EPrint Type:Article
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Computational, Information-Theoretic Learning with Statistics
Learning/Statistics & Optimisation
Theory & Algorithms
ID Code:7082
Deposited By:Ronald Ortner
Deposited On:01 March 2011