PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Bayesian Actor Critic: A Bayesian Model for Value Function Approximation and Policy Learning
Mohammad Ghavamzadeh and Yaakov Engel
In: Workshop on Regression in Robotics—Approaches and Applications, Robotics: Science and Systems Conference (RSS-2009), 28 June 2009, Seattle, WA, USA.

Abstract

In this paper, we present a Bayesian take on the actor-critic architecture. The proposed Bayesian actor-critic (BAC) model uses a Bayesian class of non-parametric critics based on the Gaussian process temporal-difference learning. Such critics model the action-value function as a Gaussian process, allowing Bayes' rule to be used in computing a posterior distribution over action-value functions, conditioned on the observed data. The Bayesian actor in BAC uses the posterior distribution over action-value functions computed by the critic, and derives a posterior distribution for the gradient of the average discounted return with respect to the policy parameters. Appropriate choices of prior covariance (kernel) between state-action values that make action-value function compatible with the parametric family of policies, allow us to obtain closed-form expressions for the posterior distribution of the policy gradient. The posterior mean serves as our estimate of the gradient and is used to update the policy, while the posterior covariance allows us to gauge the reliability of the update.

EPrint Type:Conference or Workshop Item (Poster)
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Learning/Statistics & Optimisation
ID Code:6121
Deposited By:Mohammad Ghavamzadeh
Deposited On:08 March 2010