PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Apprenticeship learning using inverse reinforcement learning and gradient methods
G Neu and Csaba Szepesvari
In: UAI-07(2007).


In this paper we propose a novel gradient al- gorithm to learn a policy from an expert's observed behavior assuming that the expert behaves optimally with respect to some un- known reward function of a Markovian De- cision Problem. The algorithm's aim is to find a reward function such that the resulting optimal policy matches well the expert's ob- served behavior. The main difficulty is that the mapping from the parameters to poli- cies is both nonsmooth and highly redun- dant. Resorting to subdifferentials solves the first difficulty, while the second one is over- come by computing natural gradients. We tested the proposed method in two artificial domains and found it to be more reliable and e±cient than some previous methods.

EPrint Type:Conference or Workshop Item (Paper)
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Computational, Information-Theoretic Learning with Statistics
Learning/Statistics & Optimisation
Theory & Algorithms
ID Code:6350
Deposited By:Csaba Szepesvari
Deposited On:08 March 2010