PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Analysis of Optimistic Algorithms for the Exploration/Exploitation Trade-Off
Peter Auer
In: Foundations of Computational Mathematics, Hongkong(2008).

Abstract

We consider decision problems where repeatedly decisions, associated with some gain, need to be made. Making a decision, one may rely on the information collected so far (exploitation), or one may try to collect further information (exploration). The risk of an exploitative decision is that a better decision – with higher gain – is not recognized because of insufficient information. The risk of an explorative decision is its non-optimality, typically. Optimistic algorithms deal with this exploration/exploitation trade-off implicitly, by assuming the most favourable gain process which is consistent with the information collected so far. Decisions are made based on this optimistic assumption. In my talk I will show how such optimistic algorithms can be analysed, by examples for the bandit problem and for the reinforcement learning problem. While the generic part of these analyses is very similar, the more technical part needs to bound the distance between the optimistically assumed gain process and the actual gain process.

PDF (Slides) - Requires Adobe Acrobat Reader or other PDF viewer.
EPrint Type:Conference or Workshop Item (Invited Talk)
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Computational, Information-Theoretic Learning with Statistics
Theory & Algorithms
ID Code:4839
Deposited By:Peter Auer
Deposited On:24 March 2009