Gradient-based estimates of return distributions
Christos Dimitrakakis and Samy Bengio
In: Principled methods of trading exploration and exploitation, 6 - 7 July 2005, London, UK.
We present a general method for maintaining estimates of the
distribution of parameters in arbitrary models. This is then applied
to the estimation of probability distributions over actions in
value-based reinforcement learning. While this approach is similar to
other techniques that maintain a confidence measure for action-values,
it nevertheless offers an insight into current techniques and hints at
potential avenues of further research.