PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Speedy Q-Learning
Mohammad Azar, R. Munos, M. Ghavamzadaeh and Bert Kappen
In: NIPS 2011, December 2011, Spain.

Abstract

We introduce a new convergent variant of Q-learning, called speedy Q-learning, in order to address the problem of slow convergence in the standard form of the Q-learning algorithm. We prove a PAC bound on the performance of SQL, which shows that only T = O &#x100000 log(1/)−2(1 − )−4 steps are required for the SQL algorithm to converge to an -optimal action-value function with high probability. This bound has a better dependency on 1/ and 1/(1− ), and thus, is tighter than the best available results for Q-learning. Our bound is also superior to the existing results for both model-free and model-based instances of batch Q-value iteration that are considered to be more sample-efficient than the incremental methods like Q-learning.

EPrint Type:Conference or Workshop Item (Paper)
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Computational, Information-Theoretic Learning with Statistics
Learning/Statistics & Optimisation
Theory & Algorithms
ID Code:8306
Deposited By:Bert Kappen
Deposited On:14 October 2011