PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Convergence of Least Squares Temporal Difference Methods Under General Conditions
Huizhen Yu
In: ICML 2010(2010).


We consider approximate policy evaluation for finite state and action Markov decision processes (MDP) in the off-policy learning context and with the simulation-based least squares temporal difference algorithm, LSTD(lambda). We establish for the discounted cost criterion that the off-policy LSTD(lambda) converges almost surely under mild, minimal conditions. We also analyze other convergence and boundedness properties of the iterates involved in the algorithm, and based on them, we suggest a modification in its practical implementation. Our analysis uses theories of both finite space Markov chains and Markov chains on topological spaces.

EPrint Type:Conference or Workshop Item (Talk)
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Theory & Algorithms
ID Code:8062
Deposited By:Huizhen Yu
Deposited On:17 March 2011