PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Regularized Fitted Q-iteration: Application to Planning
Amir massoud Farahmand, Mohammad Ghavamzadeh, Shie Mannor and Csaba Szepesvari
In: EWRL 2008, 30 June - 3 July 2008, Lille, France.

Abstract

We consider planning in a Markovian decision problem, i.e., the problem of finding a good policy given access to a generative model of the environment. We propose to use fitted Q-iteration with penalized (or regularized) least-squares regression as the regression subroutine to address the problem of controlling model-complexity. The algorithm is presented in detail for the case when the function space is a reproducingkernel Hilbert space underlying a user-chosen kernel function. We derive bounds on the quality of the solution and argue that data-dependent penalties can lead to almost optimal performance. A simple example is used to illustrate the benefits of using a penalized procedure.

PDF - Requires Adobe Acrobat Reader or other PDF viewer.
EPrint Type:Conference or Workshop Item (Paper)
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Learning/Statistics & Optimisation
ID Code:4931
Deposited By:Csaba Szepesvari
Deposited On:24 March 2009