|
Rollout Allocation Strategies for Classification-based Policy Iteration AbstractClassification-based policy iteration algorithms are variations of policy iteration that do not use any kind of value function representation. The main idea is {\bf 1)} to replace the usual value function learning step with rollout estimates of the value function over a finite number of states, called the rollout set, and the actions in the action space, and {\bf 2)} to cast the policy improvement step as a classification problem. The choice of rollout allocation strategies over states and actions has significant impact on the performance and computation time of this class of algorithms. In this paper, we present new strategies to allocate the available budget (number of rollouts) at each iteration of the algorithm over states and actions. Our empirical results indicate that for a fixed budget, using the proposed strategies improves the accuracy of the training set over the existing methods.
[Edit] |