Particle Filter-based Policy Gradient in POMDPs
Pierre-Arnaud Coquelin and Rémi Munos
In: Neural Information Processing Systems(2008).
Our setting is a Partially Observable Markov Decision Process with continuous
state, observation and action spaces. Decisions are based on a Particle Filter for
estimating the belief state given past observations. We consider a policy gradient
approach for parameterized policy optimization. For that purpose, we investigate
sensitivity analysis of the performance measure with respect to the parameters of
the policy, focusing on Finite Difference (FD) techniques. We show that the naive
FD is subject to variance explosion because of the non-smoothness of the resam-
pling procedure. We propose a more sophisticated FD method which overcomes
this problem and establish its consistency.