PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Gradient-based estimates of return distributions
Christos Dimitrakakis and Samy Bengio
In: Principled methods of trading exploration and exploitation, 6 - 7 July 2005, London, UK.


We present a general method for maintaining estimates of the distribution of parameters in arbitrary models. This is then applied to the estimation of probability distributions over actions in value-based reinforcement learning. While this approach is similar to other techniques that maintain a confidence measure for action-values, it nevertheless offers an insight into current techniques and hints at potential avenues of further research.

Postscript - Requires a viewer, such as GhostView
EPrint Type:Conference or Workshop Item (Oral)
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Computational, Information-Theoretic Learning with Statistics
Learning/Statistics & Optimisation
Theory & Algorithms
ID Code:1482
Deposited By:Christos Dimitrakakis
Deposited On:28 November 2005