PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Learning When to Stop Thinking and Do Something
B Póczos, Y Abbasi-Yadkori, Csaba Szepesvari, R Greiner and N Sturtevant
In: ICML-09(2009).

Abstract

An anytime algorithm is capable of returning a response to the given task at essentially any time; typically the quality of the response improves as the time increases. Here, we consider the challenge of learning when we should terminate such algorithms on each of a sequence of iid tasks, to optimize the expected average reward per unit time. We provide a system for addressing this challenge, which combines the global optimizer Cross- Entropy method with local gradient ascent. This paper theoretically investigates how far the estimated gradient is from the true gradient, then empirically demonstrates that this system is effective by applying it to a toy problem, as well as on a real-world face detection task.

PDF - Requires Adobe Acrobat Reader or other PDF viewer.
EPrint Type:Conference or Workshop Item (Paper)
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Theory & Algorithms
ID Code:6346
Deposited By:Csaba Szepesvari
Deposited On:08 March 2010