PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Policy-Gradients for PSRs and POMDPs
Douglas Aberdeen, Olivier Buffet and Owen Thomas
In: Proceedings of the Eleventh International Conference on Artificial Intelligence and Statistics (AIstats2007) (2007) Society for Artificial Intelligence and Statistics , San Juan, Puerto Rico .


In uncertain and partially observable environments control policies must be a function of the complete history of actions and observations.Rather than present an ever growing history to a learner, we instead track sufficient statistics of the history and map those to a control policy. The mapping has typically been done using dynamic programming, requiring large amounts of memory. We present a general approach to mapping sufficient statistics directly to control policies by combining the tracking of sufficient statistics with the use of policy-gradient reinforcement learning. The best known sufficient statistic is the belief state, computed from a known or estimated partially observable Markov decision process (POMDP) model. More recently, predictive state representations (PSRs) have emerged as a potentially compact model of partially observable systems. Our experiments explore the usefulness of both of these sufficient statistics, exact and estimated, in direct policy-search.

PDF - Requires Adobe Acrobat Reader or other PDF viewer.
EPrint Type:Book Section
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Learning/Statistics & Optimisation
ID Code:4043
Deposited By:S V N Vishwanathan
Deposited On:25 February 2008