PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Rollout Allocation Strategies for Classification-based Policy Iteration
Victor Gabillon, Alessandro Lazaric and Mohammad Ghavamzadeh
In: Workshop on Reinforcement Learning and Search in Very Large Spaces, Twenty-Seventh International Conference on Machine Learning (ICML-2010), 25 June 2010, Haifa, Israel.

Abstract

Classification-based policy iteration algorithms are variations of policy iteration that do not use any kind of value function representation. The main idea is {\bf 1)} to replace the usual value function learning step with rollout estimates of the value function over a finite number of states, called the rollout set, and the actions in the action space, and {\bf 2)} to cast the policy improvement step as a classification problem. The choice of rollout allocation strategies over states and actions has significant impact on the performance and computation time of this class of algorithms. In this paper, we present new strategies to allocate the available budget (number of rollouts) at each iteration of the algorithm over states and actions. Our empirical results indicate that for a fixed budget, using the proposed strategies improves the accuracy of the training set over the existing methods.

PDF - Requires Adobe Acrobat Reader or other PDF viewer.
EPrint Type:Conference or Workshop Item (Paper)
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Learning/Statistics & Optimisation
ID Code:7386
Deposited By:Mohammad Ghavamzadeh
Deposited On:17 March 2011