PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Reinforcement Learning by Advantage Weighted Regression
Gerhard Neumann and Jan Peters
In: ICML 08, 5 - 9 July 2008, Helsinki.


Recently, batch mode reinforcement learning (BMRL) methods have become more popular due to their higher learning speed, more stable learning processes and the higher quality of the resulting policy. However, these methods remain hard to use for continuous action spaces which frequently occur in real-world control tasks, e.g., in robotics and in plant control. The greedy action selection commonly used in BMRL is particularly problematic as it is expensive for continuous actions, can cause an unstable learning process, introduces an optimization bias and results into highly non-smooth policies unsuitable for real-world systems. In this paper we offer an alternative approach to reinforcement learning where we aim at finding good smooth approximations of the optimal policy by reducing the standard reinforcement learning problem to an iterative advantage-weighted regression problem. The resulting algorithm naturally produces smooth continuous policies and outperforms current state of the art methods.

EPrint Type:Conference or Workshop Item (Paper)
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Learning/Statistics & Optimisation
ID Code:4108
Deposited By:Gerhard Neumann
Deposited On:29 March 2008