## AbstractWe introduce a stochastic approximation method for the solution of a Kullback-Leibler optimization problem, which is a generalization of Z-learning introduced by [Todorov, 2007]. A KL-optimization problem is Markov decision process with a finite state space and continuous control space. Because the control cost has a special form involving the Kullback-Leibler divergence, it can be shown that the problem may be solved essentially by finding the largest eigenvector and eigen-value of a non-negative matrix. The stochastic algorithm presented in this paper may be used to solve this problem. It allows for a sound theoretical analysis and can be shown to be comparable to the power method in terms of convergence speed. It may be used as the basis of a reinforcement learning style algorithm for Markov decision problems.
[Edit] |