PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

A Least Squares Q-Learning Algorithm for Optimal Stopping Problems
Huizhen Yu and Dimitri Bertsekas
(2006) Technical Report. Lab for Information and Decision Systems (LIDS), MIT.


We consider the solution of discounted optimal stopping problems using linear function approximation methods. A Q-learning algorithm for such problems, proposed by Tsitsiklis and Van Roy, is based on the method of temporal differences and stochastic approximation. We propose alternative algorithms, which are based on projected value iteration ideas and least squares. We prove the convergence of some of these algorithms and discuss their properties.

PDF - PASCAL Members only - Requires Adobe Acrobat Reader or other PDF viewer.
EPrint Type:Monograph (Technical Report)
Additional Information:A preliminary version of this extended report appeared in the European Control Conference, 2007 (ECC'07).
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Theory & Algorithms
ID Code:3019
Deposited By:Huizhen Yu
Deposited On:29 July 2007