PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Least Squares Temporal Difference Methods: An Analysis Under General Conditions
Huizhen Yu
(2010) Technical Report. University of Helsinki.

Abstract

We consider approximate policy evaluation for finite state and action Markov decision processes (MDP) with the least squares temporal difference algorithm, LSTD(lambda), in an exploration-enhanced off-policy learning context. We establish for the discounted cost criterion that the off-policy LSTD(lambda) converges almost surely under mild, minimal conditions. We also analyze other convergence and boundedness properties of the iterates involved in the algorithm. Our analysis draws on theories of both finite space Markov chains and weak Feller Markov chains on topological spaces. Our results can be applied to other temporal difference algorithms and MDP models. As examples, we give a convergence analysis of an off-policy TD(lambda) algorithm and extensions to MDP with compact action and state spaces.

EPrint Type:Monograph (Technical Report)
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Theory & Algorithms
ID Code:8064
Deposited By:Huizhen Yu
Deposited On:17 March 2011