Expectation-Maximization methods for solving (PO)MDPs and optimal control problems Marc Toussaint, Amos Storkey and Stefan Harmeling In: Inference and Learning in Dynamic Models (2011) Cambridge University Press .
[Edit]