PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Combinatorial Bandits
Nicolò Cesa-Bianchi and Gábor Lugosi
In: Proceedings of the 22nd Annual Conference on Learning Theory, Montreal, Canada(2009).

Abstract

We study sequential prediction problems in which, at each time instance, the forecaster chooses a binary vector from a certain fixed set S in {0,1}^d and suffers a loss that is the sum of the losses of those vector components that equal to one. The goal of the forecaster is to achieve that, in the long run, the accumulated loss is not much larger than that of the best possible vector in the class. We consider the "bandit'' setting in which the forecaster has only access to the losses of the chosen vectors. We introduce a new general forecaster achieving a regret bound that, for a variety of concrete choices of S, is of order sqrt{nd ln|S|}$ where n is the time horizon. This is not improvable in general and is better than previously known bounds. We also point out that computationally efficient implementations for various interesting choices of S exist.

Postscript - Requires a viewer, such as GhostView
EPrint Type:Conference or Workshop Item (Paper)
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Theory & Algorithms
ID Code:5985
Deposited By:Nicolò Cesa-Bianchi
Deposited On:08 March 2010