A multitask representation using reusable local policy templates
Constructing robust controllers to perform tasks in large, continually changing worlds is a difﬁcult problem. A long-lived agent placed in such a world could be required to perform a variety of different tasks. For this to be possible, the agent needs to be able to abstract its experiences in a reusable way. This paper addresses the problem of online multitask decision making in such complex worlds, with inherent incompleteness in models of change. A fully general version of this problem is intractable but many interesting domains are rendered manageable by the fact that all instances of tasks may be described using a ﬁnite set of qualitatively meaningful contexts. We suggest an approach to solving the multitask problem through decomposing the domain into a set of capabilities based on these local contexts. Capabilities resemble the options of hierarchical reinforcement learning, but provide robust behaviours capable of achieving some subgoal with the associated guarantee of achieving at least a particular aspiration level of performance. This enables using these policies within a planning framework, and they become a level of abstraction which factorises an otherwise large domain into task-independent sub-problems, with well-deﬁned interfaces between the perception, control and planning problems. This is demonstrated in a stochastic navigation example, where an agent reaches different goals in different world instances without relearning.