PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Online Regret Bounds for a New Reinforcement Learning Algorithm
Peter Auer and Ronald Ortner
In: 1st Austrian Cognitive Vision Workshop, 31 Jan 2005, Zell an der Pram, Austria.

There is a more recent version of this eprint available. Click here to view it.

Abstract

We present a new learning algorithm for undiscounted finite-step reinforcement learning with restarts. Our algorithm is based on upper confidence bounds and achieves logarithmic online regret in respect to an optimal policy.

EPrint Type:Conference or Workshop Item (Paper)
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Computational, Information-Theoretic Learning with Statistics
Theory & Algorithms
ID Code:1316
Deposited By:Ronald Ortner
Deposited On:28 November 2005

Available Versions of this Item