Multiclass classification with bandit feedback using adaptive regularization
Koby Crammer and Claudio Gentile
In: ICML 2011, June 28th - Jul 2nd 2011, Bellevue, USA.

## Abstract

We present a new multiclass algorithm in the bandit framework, where after making a prediction, the learning algorithm receives only partial feedback, i.e., a single bit of right-or-wrong, rather then the true label. Our algorithm is based on the 2nd-order Perceptron, and uses upper-confidence bounds to trade off exploration and exploitation. We analyze this algorithm in a partial adversarial setting, where instances are chosen adversarially, while the labels are chosen according to a linear probabilistic model, which is also chosen adversarially. We show a regret of O(\sqrt{T}\log T), which improves over the current best bounds of O(T^{2/3}) in the fully adversarial setting. We evaluate our algorithm on nine real-world text classification problems, obtaining state-of-the-art results, even compared with non-bandit online algorithms, especially when label noise is introduced.