Fast Online Policy Gradient Learning with SMD Gain Vector Adaptation
Nicol N. Schraudolph, Jin Yu and Douglas Aberdeen
In: NIPS 2005, 5-8 Dec 2005, Vancouver, Canada.
Reinforcement learning by direct policy gradient estimation is attractive
in theory but in practice leads to notoriously ill-behaved optimization
problems. We improve its robustness and speed of convergence with
stochastic meta-descent, a gain vector adaptation method that employs
fast Hessian-vector products. In our experiments the resulting algorithms
outperform previously employed online stochastic, offline conjugate, and
natural policy gradient methods.