Rollout Allocation Strategies for Classification-based Policy Iteration
Victor Gabillon, Alessandro Lazaric and Mohammad Ghavamzadeh
In: ICML 2010 - Workshop Reinforcement Learning and Search in Very Large Spaces, 21-24 June 2010, Haifa, Israel.
Classification-based policy iteration algo- rithms are variations of policy iteration that do not use any kind of value function rep- resentation. The main idea is 1) to replace the usual value function learning step with rollout estimates of the value function over a finite number of states, called the rollout set, and the actions in the action space, and 2) to cast the policy improvement step as a classifi- cation problem. The choice of rollout alloca- tion strategies over states and actions has sig- nificant impact on the performance and com- putation time of this class of algorithms. In this paper, we present new strategies to allo- cate the available budget (number of rollouts) at each iteration of the algorithm over states and actions. Our empirical results indicate that for a fixed budget, using the proposed strategies improves the accuracy of the train- ing set over the existing methods.