Analysis of a Classification-based Policy Iteration Algorithm
Alessandro Lazaric, Mohammad Ghavamzadeh and Rémi Munos
In: Twenty-Seventh International Conference on Machine Learning (ICML-2010), 21-24 June 2010, Haifa, Israel.
We present a classification-based policy iteration algorithm, called Direct Policy Iteration, and provide its finite-sample analysis. Our results state a performance bound in terms of the number of policy improvement steps, the number of rollouts used in each iteration, the capacity of the considered policy space, and a new capacity measure which indicates how well the policy space can approximate policies that are greedy w.r.t. any of its members. The analysis reveals a tradeoff between the estimation and approximation errors in this classification-based policy iteration setting. We also study the consistency of the method when there exists a sequence of policy spaces with increasing capacity.