PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Active learning in multi-armed bandits
András Antos, Varun Grover and Csaba Szepesvari
In: 19th International Conference on Algorithmic Learning Theory, ALT 2008, Proceedings Lecture Notes in Computer Science/Lecture Notes in Artificial Intelligence , 5254 . (2008) Springer-Verlag , Berlin, Heidelberg, Germany , pp. 287-302. ISBN 978-3-540-87986-2


In this paper we consider the problem of actively learning the mean values of distributions associated with a finite number of options (arms). The algorithms can select which option to generate the next sample from in order to produce estimates with equally good precision for all the distributions. When an algorithm uses sample means to estimate the unknown values then the optimal solution, assuming full knowledge of the distributions, is to sample each option proportional to its variance. In this paper we propose an incremental algorithm that asymptotically achieves the same loss as an optimal rule. We prove that the excess loss suffered by this algorithm, apart from logarithmic factors, scales as n^{-3/2}, which we conjecture to be the optimal rate. The performance of the algorithm is illustrated in a simple problem.

PDF - PASCAL Members only - Requires Adobe Acrobat Reader or other PDF viewer.
EPrint Type:Book Section
Additional Information:(Budapest, Hungary, October 13-16, 2008.)
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Learning/Statistics & Optimisation
Theory & Algorithms
ID Code:4174
Deposited By:András Antos
Deposited On:24 March 2009