PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Model-based Reinforcement Learning with Continuous States and Actions
Marc Deisenroth, Carl Edward Rasmussen and Jan Peters
In: European Symposium on Artificial Neural Networks, 23-25 April, Bruges, Belgium.


Finding an optimal policy in a reinforcement learning (RL) framework with continuous state and action spaces is challenging. Approximate solutions are often inevitable. GPDP is an approximate dynamic programming algorithm based on Gaussian process (GP) models for the value functions. In this paper, we extend GPDP to the case of unknown transition dynamics. After building a GP model for the transition dynamics, we apply GPDP to this model and determine a continuous-valued policy in the entire state space. We apply the resulting controller to the underpowered pendulum swing up. Moreover, we compare our results on this RL task to a nearly optimal discrete DP solution in a fully known environment.

PDF - Requires Adobe Acrobat Reader or other PDF viewer.
PDF (Errata) - Requires Adobe Acrobat Reader or other PDF viewer.
EPrint Type:Conference or Workshop Item (Paper)
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Computational, Information-Theoretic Learning with Statistics
Learning/Statistics & Optimisation
Theory & Algorithms
ID Code:4446
Deposited By:Marc Deisenroth
Deposited On:13 March 2009