PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Reinforcement learning algorithms for MDPs
Csaba Szepesvari
Technical Report TR09-13 2009.


This article presents a survey of reinforcement learning algorithms for Markov De- cision Processes (MDP). In the rst half of the article, the problem of value estimation is considered. Here we start by describing the idea of bootstrapping and temporal dierence learning. Next, we compare incremental and batch algorithmic variants and discuss the impact of the choice of the function approximation method on the success of learning. In the second half, we describe methods that target the problem of learning to control an MDP. Here online and active learning are discussed rst, followed by a description of direct and actor-critic methods.

EPrint Type:Article
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Computational, Information-Theoretic Learning with Statistics
Learning/Statistics & Optimisation
Theory & Algorithms
ID Code:6373
Deposited By:Csaba Szepesvari
Deposited On:08 March 2010