PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Combinations and mixtures of optimal policies in unichain MDPs are optimal
Ronald Ortner
(2005) Cornell University.

Abstract

We show that combinations of optimal (stationary) policies in unichain Markov decision processes are optimal. That is, let M be a unichain Markov decision process with state space S, action space A and policies pi_j*: S->A (1<= j<= n) with optimal average infinite horizon reward. Then any combination pi of these policies, where for each state i in S there is a j such that pi(i)=pi_j*(i), is optimal as well. Furthermore, we prove that any mixture of optimal policies, where at each visit in a state i an arbitrary action pi_j*(i) of an optimal policy is chosen, yields optimal average reward, too.

EPrint Type:Other
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Computational, Information-Theoretic Learning with Statistics
Theory & Algorithms
ID Code:1322
Deposited By:Ronald Ortner
Deposited On:28 November 2005