PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Asymptotic Performance Guarantee for Online Reinforcement Learning with the Least-Squares Regression
Mohammad Azar and Bert Kappen
In: The learning workshop (snowbird), 13-16 April 2011, Fort Lauderdale, FL, USA.


We introduce a new online reinforcement learning algorithm, the least-squares action-preference learning (LS-APL), for stochastic infinite-horizon Markov decision processes. We provide a non-trivial asymptotic performance loss bound with probability 1 for LS-APL. This result holds if the stochastic Markov process induced by the learning policy satisfies some certain mixing assumption. The bound mainly differs from the existing ones as it applies to problems with limited sampling budget per iteration. To illustrate the applicability of LS-APL we asses its performance on the optimal replacement problem.

EPrint Type:Conference or Workshop Item (Poster)
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Computational, Information-Theoretic Learning with Statistics
Learning/Statistics & Optimisation
Theory & Algorithms
ID Code:8387
Deposited By:Mohammad Azar
Deposited On:02 December 2011