PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Variational methods for Reinforcement Learning
David Barber and Tom Furmston
In: AISTATS 2010, 13-15 May, 2010, Sardinia, Italy.


Casting solving a known Markov Decision Problem as probabilistic inference on a related graphical model has the benet of opening the eld of MDPs to recent developments in approximate inference. In this paper we extend this framework to the reinforcement learning problem in which the transition and reward distributions are not given and need to be learned on the basis of interaction with the environment. A naive approach is to use a point estimate of the transition model. However, this does not reflect the uncertainty in the model of the environment and, as such, one cannot expect such a naive approach to form policies which maintain a degree of exploration. Instead, we suggest a Bayesian solution that maintains a posterior distribution over transition, which enables us to take account of the uncertainty in our knowledge of the transition model when planning. The resulting EM algorithm is formally intractable and we discuss two approximate solution methods, one based on variational Bayes and the other on expectation propagation.

EPrint Type:Conference or Workshop Item (Paper)
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Computational, Information-Theoretic Learning with Statistics
Learning/Statistics & Optimisation
Theory & Algorithms
ID Code:6100
Deposited By:David Barber
Deposited On:08 March 2010