PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Pseudometrics for State Aggregation in Average Reward Markov Decision Processes
Ronald Ortner
In: Algorithmic Learning Theory, 18th International Conference, ALT 2007, Sendai, Japan, October 1-4, 2007, Proceedings Lecture Notes in Computer Science (4754). (2007) Springer , pp. 373-387. ISBN 978-3-540-75224-0

Abstract

We consider how state similarity in average reward Markov decision processes (MDPs) may be described by pseudometrics. Introducing the notion of adequate pseudometrics which are well adapted to the structure of the MDP, we show how these may be used for state aggregation. Upper bounds on the loss that may be caused by working on the aggregated instead of the original MDP are given and compared to the bounds that have been achieved for discounted reward MDPs.

EPrint Type:Book Section
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Computational, Information-Theoretic Learning with Statistics
Learning/Statistics & Optimisation
Theory & Algorithms
ID Code:3251
Deposited By:Ronald Ortner
Deposited On:02 February 2008