PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Filtered Reinforcement Learning
Douglas Aberdeen
In: European Conference on Machine Learning, Jun 2004, Pisa.


Reinforcement learning (RL) algorithms attempt to assign the credit for rewards to the actions that contributed to the reward. Thus far, credit assignment has been done in one of two ways: uniformly, or using a discounting model that assigns exponentially more credit to recent actions. This paper demonstrates an alternative approach to tem- poral credit assignment, taking advantage of exact or approximate prior information about correct credit assignment. In nite impulse response (IIR) lters are used to model credit assignment information. IIR lters generalise exponentially discounting eligibility traces to arbitrary credit assignment models. This approach can be applied to any RL algorithm that employs an eligibility trace. The use of IIR credit assignment lters is explored using both the GPOMDP policy-gradient algorithm and the Sarsa( ) temporal-di erence algorithm. A drop in bias and variance of value or gradient estimates is demonstrated, resulting in faster conver- gence to better policies.

PDF - Requires Adobe Acrobat Reader or other PDF viewer.
EPrint Type:Conference or Workshop Item (Paper)
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Learning/Statistics & Optimisation
ID Code:858
Deposited By:Adam Kowalczyk
Deposited On:02 January 2005