PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

PILCO: A Model-Based and Data-Efficient Approach to Policy Search
Marc P Deisenroth and Carl Edward Rasmussen
In: International Conference on Machine Learning, June 28 - July 2, 2011, Bellevue, WA, USA.


In this paper, we introduce PILCO, a practical, data-efficient model-based policy search method. PILCO reduces model bias, one of the key problems of model-based reinforcement learning, in a principled way. By learning a probabilistic dynamics model and explicitly incorporating model uncertainty into long-term planning, PILCO can cope with very little data and facilitates learning from scratch in only a few trials. Policy evaluation is performed in closed form using state-of-the-art approximate inference. Furthermore, policy gradients are computed analytically for policy improvement. We report unprecedented learning efficiency on challenging and high-dimensional control tasks.

PDF - Requires Adobe Acrobat Reader or other PDF viewer.
EPrint Type:Conference or Workshop Item (Paper)
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Computational, Information-Theoretic Learning with Statistics
Theory & Algorithms
ID Code:8310
Deposited By:Marc Deisenroth
Deposited On:19 October 2011