PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Online solution of the average cost Kullback-Leibler optimization problem
Joris Bierkens and Bert Kappen
In: NIPS 2011, 4th International Workshop on Optimization for Machine Learning, 16 December 2011, Granada, Spain.


We introduce a stochastic approximation method for the solution of a Kullback-Leibler optimization problem, which is a generalization of Z-learning introduced by [Todorov, 2007]. A KL-optimization problem is Markov decision process with a finite state space and continuous control space. Because the control cost has a special form involving the Kullback-Leibler divergence, it can be shown that the problem may be solved essentially by finding the largest eigenvector and eigen-value of a non-negative matrix. The stochastic algorithm presented in this paper may be used to solve this problem. It allows for a sound theoretical analysis and can be shown to be comparable to the power method in terms of convergence speed. It may be used as the basis of a reinforcement learning style algorithm for Markov decision problems.

PDF - PASCAL Members only - Requires Adobe Acrobat Reader or other PDF viewer.
EPrint Type:Conference or Workshop Item (Paper)
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Learning/Statistics & Optimisation
Theory & Algorithms
ID Code:8419
Deposited By:Joris Bierkens
Deposited On:21 December 2011