Gaussian Process Dynamic Programming
Marc Deisenroth, Carl Edward Rasmussen and Jan Peters
Reinforcement learning (RL) and optimal control of systems with continuous
states and actions require approximation techniques in most interesting cases.
In this article, we introduce Gaussian process dynamic programming (GPDP), an
approximate value-function based RL algorithm. We consider both a classic
optimal control problem, where problem-specific prior knowledge is available,
and a classic RL problem, where only very general priors can be used.
For the classic optimal control problem, GPDP models the unknown value functions
with Gaussian processes and generalizes dynamic programming to continuous-valued
states and actions. For the RL problem, GPDP starts from a given initial state
and explores the state space using Bayesian active learning. To
design a fast learner, available data have to be used efficiently.
Hence, we propose to learn probabilistic models of the a priori unknown
transition dynamics and the value functions on the fly. In both
cases, we successfully apply the resulting continuous-valued controllers to the
under-actuated pendulum swing up and analyze the performances of the suggested
algorithms. It turns out that GPDP uses data very efficiently and can be applied
to problems, where classic dynamic programming would be cumbersome.