PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Dynamic Policy Programming by Kullback-Leibler Divergence Minimization
Mohammad Azar and Bert Kappen
In: NIPS Workshop on Probabilistic Approaches for Stochastic Optimal Control and Robotics, 11-12 December 2009, Whistler, BC, Canada.

Abstract

The poster presents a novel optimal control approach, called Dynamic Policy Programming (DPP) that directly operates in terms of a policy distribution instead of the value function. The advantage of DPP over the value based methods is in combination with function approximation, when it is known that the greedy policies that are derived from the value function require a higher accuracy of the value function than can often be realized.

EPrint Type:Conference or Workshop Item (Poster)
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Theory & Algorithms
ID Code:8386
Deposited By:Mohammad Azar
Deposited On:02 December 2011