PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Competitive Reinforcement Learning
Peter Auer
In: Models of Behavioural Learning Workshop (at NIPS 2005), 10 Dec 2005, Whistler, Canada.


We present a new algorithm for undiscounted reinforcement learning. In contrast to the usual convergence analysis we bound the regret of our algorithm: we compare the total reward received by our algorithm during learning with the total reward of an optimal strategy. In fact, we do not distinguish a specific learning phase but bound the regret of our algorithm for any number of steps. We are able to show that the regret scales logarithmically with the number of steps – as for the much simpler bandit problem. Methodologically, we use upper confidence bounds on the expected total reward to tackle the exploration-exploitation trade-off which the online reinforcement algorithm is facing.

PDF - Requires Adobe Acrobat Reader or other PDF viewer.
EPrint Type:Conference or Workshop Item (Invited Talk)
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Computational, Information-Theoretic Learning with Statistics
Theory & Algorithms
ID Code:2066
Deposited By:Peter Auer
Deposited On:30 January 2006