Gap-free bounds for multi-armed stochastic bandit.
Anatoli Juditsky, Alexander Nazin, Alexandre Tsybakov and Nicolas Vayatis
In: World Congress of IFAC 2008, Jul 2008, Korea.
We consider the stochastic multi-armed bandit problem with unknown
horizon. We present a randomized decision strategy which is based on
updating a probability distribution through a stochastic mirror
descent type algorithm. We consider separately two assumptions:
nonnegative losses or arbitrary losses with an exponential moment
condition. We prove optimal (up to logarithmic factors) gap-free
bounds on the excess risk of the average over time of the
instantaneous losses induced by the choice of a specific action.