PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

A Two-Teams Approach for Robust Probabilistic Temporal Planning
Olivier Buffet and Douglas Aberdeen
In: ECML'05 workshop on Reinforcement Learning in Non-Stationary Environments, 7 Oct 2005, Porto, Portugal.


Large real-world Probabilistic Temporal Planning (PTP) is a very challenging research field. A common approach is to model such problems as Markov Decision Problems (MDP) and use dynamic programming techniques. Yet, two major difficulties arise: 1- dynamic programming does not scale with the number of tasks, and 2- the probabilistic model may be uncertain, leading to the choice of unsafe policies. We build here on the Factored Policy Gradient (FPG) algorithm and on robust decision-making to address both difficulties through an algorithm that trains two competing teams of learning agents. As the learning is simultaneous, each agent is facing a non-stationary environment. The goal is for them to find a common Nash equilibrium.

PDF - Requires Adobe Acrobat Reader or other PDF viewer.
EPrint Type:Conference or Workshop Item (Paper)
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Learning/Statistics & Optimisation
ID Code:1711
Deposited By:Olivier Buffet
Deposited On:28 November 2005