PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Online Regret Bounds for a New Reinforcement Learning Algorithm
Peter Auer and Ronald Ortner
In: 1st Austrian Cognitive Vision Workshop, 31 Jan 2005, Zell an der Pram, Austria.

This is the latest version of this eprint.

Abstract

We present a new learning algorithm for undiscounted finite-step reinforcement learning with restarts. Our algorithm is based on upper confidence bounds and achieves logarithmic online regret in respect to an optimal policy.

EPrint Type:Conference or Workshop Item (Paper)
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Computational, Information-Theoretic Learning with Statistics
Theory & Algorithms
ID Code:1884
Deposited By:Ronald Ortner
Deposited On:29 December 2005

Available Versions of this Item