PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Analysis of a Classification-based Policy Iteration Algorithm
Alessandro Lazaric, Mohammad Ghavamzadeh and Rémi Munos
In: Analysis of a Classification-based Policy Iteration Algorithm, 21-24 June 2010, Haifa, Israel.


We present a classification-based policy iteration algorithm, called Direct Policy Iteration, and provide its finite-sample analysis. Our results state a performance bound in terms of the number of policy improvement steps, the number of rollouts used in each iteration, the capacity of the considered policy space, and a new capacity measure which indicates how well the policy space can approximate policies that are greedy w.r.t. any of its members. The analysis reveals a tradeoff between the estimation and approximation errors in this classification-based policy iteration setting. We also study the consistency of the method when there exists a sequence of policy spaces with increasing capacity.

PDF - Requires Adobe Acrobat Reader or other PDF viewer.
EPrint Type:Conference or Workshop Item (Paper)
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Learning/Statistics & Optimisation
Theory & Algorithms
ID Code:7379
Deposited By:Mohammad Ghavamzadeh
Deposited On:17 March 2011