Bayesian Reinforcement Learning with Gaussian Process Temporal
Yaki Engel, Shie Mannor and Ron Meir
Reinforcement Learning is a class of problems frequently
encountered by both biological and artificial agents.
An important algorithmic component of many Reinforcement Learning
solution methods is the estimation of state or state-action values of
a fixed policy controlling a Markov decision process (MDP), a task
known as policy evaluation.
We present a novel Bayesian approach to policy evaluation in general
state and action spaces, which employs statistical generative models
for value functions via Gaussian processes (GPs).
The posterior distribution based on a GP-based statistical model
provides us with a value-function estimate, as well as a measure of the
variance of that estimate, opening the way to a range of possibilities
not available up to now.
We derive exact expressions for the posterior moments
of the value GP, which admit both batch and recursive computations.
An efficient sequential kernel sparsification method allows us to
derive efficient online algorithms for learning good approximations
of the posterior moments.
By allowing our algorithms to evaluate state-action
values we derive model-free algorithms based on Policy Iteration for
improving policies, thus tackling the complete RL problem.
A companion paper describes experiments conducted with the algorithms