PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Particle Filter-based Policy Gradient in POMDPs
Pierre-Arnaud Coquelin and Rémi Munos
In: Neural Information Processing Systems(2008).

Abstract

Our setting is a Partially Observable Markov Decision Process with continuous state, observation and action spaces. Decisions are based on a Particle Filter for estimating the belief state given past observations. We consider a policy gradient approach for parameterized policy optimization. For that purpose, we investigate sensitivity analysis of the performance measure with respect to the parameters of the policy, focusing on Finite Difference (FD) techniques. We show that the naive FD is subject to variance explosion because of the non-smoothness of the resam- pling procedure. We propose a more sophisticated FD method which overcomes this problem and establish its consistency.

PDF - Requires Adobe Acrobat Reader or other PDF viewer.
EPrint Type:Conference or Workshop Item (Paper)
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Learning/Statistics & Optimisation
ID Code:5143
Deposited By:Rémi Munos
Deposited On:24 March 2009