Online solution of the average cost Kullback-Leibler optimization problem
Joris Bierkens and Bert Kappen
In: NIPS 2011, 4th International Workshop on Optimization for Machine Learning, 16 December 2011, Granada, Spain.
We introduce a stochastic approximation method for the solution of a Kullback-Leibler optimization problem, which is a generalization of Z-learning introduced by [Todorov, 2007]. A KL-optimization problem is Markov decision process with a finite state space and continuous control space. Because the control cost has a special form involving the Kullback-Leibler divergence, it can be shown that the problem may be solved essentially by finding the largest eigenvector and eigen-value of a non-negative matrix. The stochastic algorithm presented in this paper may be used to solve this problem. It allows for a sound theoretical analysis and can be shown to be comparable to the power method in terms of convergence speed. It may be used as the basis of a reinforcement learning style algorithm for Markov decision problems.