PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Solving Deep Memory POMDPs with Recurrent Policy Gradients
Daan Wierstra, Alex Foerster, Jan Peters and Juergen Schmidthuber
In: nternational Conference on Artificial Neural Networks (ICANN 2008), Sept, 2007, Porto, Portugal.


This paper presents Recurrent Policy Gradients, a model- free reinforcement learning (RL) method creating limited-memory stochastic policies for partially observable Markov decision problems (POMDPs) that require long-term memories of past observations. The approach involves approximating a policy gradient for a Recurrent Neural Network (RNN) by backpropagating return-weighted characteristic eligibilities through time. Using a “Long Short-Term Memory” architecture, we are able to outperform other RL methods on two important benchmark tasks. Furthermore, we show promising results on a complex car driving simulation task.

PDF - Requires Adobe Acrobat Reader or other PDF viewer.
EPrint Type:Conference or Workshop Item (Paper)
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Learning/Statistics & Optimisation
ID Code:3588
Deposited By:Jan Peters
Deposited On:13 February 2008