PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Fast Online Policy Gradient Learning with SMD Gain Vector Adaptation
Nicol N. Schraudolph, Jin Yu and Douglas Aberdeen
In: NIPS 2005, 5-8 Dec 2005, Vancouver, Canada.


Reinforcement learning by direct policy gradient estimation is attractive in theory but in practice leads to notoriously ill-behaved optimization problems. We improve its robustness and speed of convergence with stochastic meta-descent, a gain vector adaptation method that employs fast Hessian-vector products. In our experiments the resulting algorithms outperform previously employed online stochastic, offline conjugate, and natural policy gradient methods.

PDF - Requires Adobe Acrobat Reader or other PDF viewer.
EPrint Type:Conference or Workshop Item (Poster)
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Learning/Statistics & Optimisation
Theory & Algorithms
ID Code:1665
Deposited By:Nicol Schraudolph
Deposited On:28 November 2005