PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Simulation Methods for Uncertain Decision-Theoretic Planning
Douglas Aberdeen and Olivier Buffet
In: IJCAI 2005 Workshop on Planning and Learning in A Priori Unknown or Dynamic Domains, 1 Aug 2005, Edinburgh, Scotland, UK.


Experience based reinforcement learning (RL) systems are known to be useful for dealing with domains that are \emph{a priori} unknown. We believe that experience based methods may also be useful when the model is uncertain (or even completely known). In this case experience is gained by simulating the uncertain model. This paper explores a simple way to allow experience based RL systems to cope with uncertainty in a model. The particular form of RL we consider is a policy-gradient method. The particular domains we attempt to optimise in are from temporal decision-theoretic planning. Our previous experience with military planning problems indicates that a human specified model of the planning problem is often inaccurate, especially when humans specify probabilities, thus planners that take into account this uncertainty are very useful. Despite our focus on policy-gradient RL for planning, our simple (but approximate) solution for dealing with uncertainty in the model can be applied to any simulation based RL method, such as Q-learning or SARSA. Our attempt to solve decision-theoretic planning problems with a policy-gradient approach is novel in itself, making up another contribution of this paper.

PDF - Requires Adobe Acrobat Reader or other PDF viewer.
EPrint Type:Conference or Workshop Item (Paper)
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Learning/Statistics & Optimisation
ID Code:1709
Deposited By:Olivier Buffet
Deposited On:28 November 2005