PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

FF+FPG: Guiding a Policy-Gradient Planner
Douglas Aberdeen and Olivier Buffet
In: Proceedings of the Seventeenth International Conference on Automated Planning and Scheduling (ICAPS'07) (2007) The AAAI Press , Providence, USA , pp. 42-48. ISBN 978-1-57735-344-7

Abstract

The Factored Policy-Gradient planner (FPG) was a successful competitor in the probabilistic track of the 2006 International Planning Competition (IPC). FPG is innovative because it scales to large planning domains through the use of Reinforcement Learning. It essentially performs a stochastic local search in policy space. FPG's weakness is potentially long learning times, as it initially acts randomly and progressively improves its policy each time the goal is reached. This paper shows how to use an external teacher to guide FPG's exploration. While any teacher can be used, we concentrate on the actions suggested by FF's heuristic as FF-replan has proved efficient for probabilistic re-planning. To achieve this, FPG must learn its own policy while following another. We thus extend FPG to off-policy learning using importance sampling. The resulting algorithm is presented and evaluated on IPC benchmarks.

PDF - Requires Adobe Acrobat Reader or other PDF viewer.
EPrint Type:Book Section
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Learning/Statistics & Optimisation
ID Code:4046
Deposited By:S V N Vishwanathan
Deposited On:25 February 2008