PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Adaptive Aggregation for Reinforcement Learning in Average Reward Markov Decision Processes
Ronald Ortner
Annals of Operations Research Volume 208, Number 1, pp. 321-336, 2011.

Abstract

We present an algorithm which aggregates online when learning to behave optimally in an average reward Markov decision process. The algorithm is based on the reinforcement learning algorithm UCRL and uses confidence intervals for aggregating the state space. We derive bounds on the regret our algorithm suffers with respect to an optimal policy. These bounds are only slightly worse than the original bounds for UCRL.

EPrint Type:Article
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Learning/Statistics & Optimisation
Theory & Algorithms
COMPLACS
ID Code:8500
Deposited By:Ronald Ortner
Deposited On:03 February 2012