PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Bandit based monte-carlo planning
L Kocsis and Csaba Szepesvari
In: 17th European Conference on Machine Learning(2006).

Abstract

For large state-space Markovian Decision Problems Monte-Carlo planning is one of the few viable approaches to find near-optimal solutions. In this paper we introduce a new algorithm, UCT, that applies bandit ideas to guide Monte-Carlo planning. In finite-horizon or discounted MDPs the algorithm is shown to be consistent and finite sample bounds are derived on the estimation error due to sampling. Experimental results show that in several domains, UCT is significantly more efficient than its alternatives.

EPrint Type:Conference or Workshop Item (Paper)
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Computational, Information-Theoretic Learning with Statistics
Learning/Statistics & Optimisation
Theory & Algorithms
ID Code:6352
Deposited By:Csaba Szepesvari
Deposited On:08 March 2010