PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Regularized Policy Iteration
Amir massoud Farahmand, Mohammad Ghavamzadeh, Csaba Szepesvari and Shie Mannor
In: NIPS-21, 8-11 Dec 2008, Vancouver, BC, Canada.

Abstract

In this paper we consider approximate policy-iteration-based reinforcement learning algorithms. In order to implement a flexible function approximation scheme we propose the use of non-parametric methods with regularization, providing a convenient way to control the complexity of the function approximator. We propose two novel regularized policy iteration algorithms by adding L2-regularization to two widely-used policy evaluation methods: Bellman residual minimization (BRM) and least-squares temporal difference learning (LSTD). We derive efficient implementation for our algorithms when the approximate value-functions belong to a reproducing kernel Hilbert space. We also provide finite-sample performance bounds for our algorithms and show that they are able to achieve optimal rates of convergence under the studied conditions.

PDF - Requires Adobe Acrobat Reader or other PDF viewer.
EPrint Type:Conference or Workshop Item (Paper)
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Learning/Statistics & Optimisation
ID Code:4930
Deposited By:Csaba Szepesvari
Deposited On:24 March 2009