Q-learning Algorithms for Optimal Stopping Based on Least Squares
Huizhen Yu and Dimitri Bertsekas
In: European Control Conference (ECC'07), 2-5 Jul 2007, Kos, Greece.
We consider the solution of discounted optimal stopping problems using linear function approximation methods. A Q-learning algorithm for such problems, proposed by Tsitsiklis and Van Roy, is based on the method of temporal differences and stochastic approximation. We propose alternative algorithms, which are based on projected value iteration ideas and least squares. We prove the convergence of some of these algorithms and discuss their properties.
|EPrint Type:||Conference or Workshop Item (Paper)|
|Project Keyword:||Project Keyword UNSPECIFIED|
|Subjects:||Theory & Algorithms|
|Deposited By:||Huizhen Yu|
|Deposited On:||08 February 2008|