On the loss version of the adversarial multi-armed bandit
Chamy Allenberg and Peter Auer
The Loss Bandit game is the loss variant of the adversarial multi-armed bandit problem. It is carried out in T iterations. At the beginning of any iteration an adversary assigns losses from [0,1] to each of the K options. Then, without knowing the adversary's assignments, we are required to select one out of the $K$ arms, and suffer the loss that was assigned to it. We compete against the optimal loss, which is the minimal total loss of the best arm.
In this work we present an optimal upper bound on the regret of the Loss Bandit game. It is the first upper bound on the regret of the Loss Bandit game that is a function of the optimal loss.