PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Gap-free bounds for multi-armed stochastic bandit.
Anatoli Juditsky, Alexander Nazin, Alexandre Tsybakov and Nicolas Vayatis
In: World Congress of IFAC 2008, Jul 2008, Korea.


We consider the stochastic multi-armed bandit problem with unknown horizon. We present a randomized decision strategy which is based on updating a probability distribution through a stochastic mirror descent type algorithm. We consider separately two assumptions: nonnegative losses or arbitrary losses with an exponential moment condition. We prove optimal (up to logarithmic factors) gap-free bounds on the excess risk of the average over time of the instantaneous losses induced by the choice of a specific action.

PDF - Requires Adobe Acrobat Reader or other PDF viewer.
EPrint Type:Conference or Workshop Item (Paper)
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Computational, Information-Theoretic Learning with Statistics
Learning/Statistics & Optimisation
Theory & Algorithms
ID Code:3866
Deposited By:Alexandre Tsybakov
Deposited On:25 February 2008