PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Analysis of a Classification-based Policy Iteration Algorithm
Alessandro Lazaric, Mohammad Ghavamzadeh and Rémi Munos
In: Twenty-Seventh International Conference on Machine Learning (ICML-2010), 21-24 June 2010, Haifa, Israel.

Abstract

We present a classification-based policy iteration algorithm, called Direct Policy Iteration, and provide its finite-sample analysis. Our results state a performance bound in terms of the number of policy improvement steps, the number of rollouts used in each iteration, the capacity of the considered policy space, and a new capacity measure which indicates how well the policy space can approximate policies that are greedy w.r.t. any of its members. The analysis reveals a tradeoff between the estimation and approximation errors in this classification-based policy iteration setting. We also study the consistency of the method when there exists a sequence of policy spaces with increasing capacity.

PDF - Requires Adobe Acrobat Reader or other PDF viewer.
EPrint Type:Conference or Workshop Item (Paper)
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Information Retrieval & Textual Information Access
ID Code:7229
Deposited By:Alessandro Lazaric
Deposited On:12 March 2011