PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Speedy Q-learning
Mohammad Gheshlaghi Azar, Rémi Munos, Mohammad Ghavamzadeh and Bert Kappen
In: Twenty-fifth annual conference on neural information processing systems (NIPS 2011), 12-15 Dec 2011, Granada, Spain.

Abstract

We introduce a new convergent variant of Q-learning, called speedy Q-learning, to address the problem of slow convergence in the standard form of the Q-learning algorithm. We prove a PAC bound on the performance of SQL, which shows that for an MDP with n state-action pairs and the discount factor \gamma only T=O(\log(n)/(\epsilon^{2}(1-\gamma)^{4})) steps are required for the SQL algorithm to converge to an \epsilon-optimal action-value function with high probability. This bound has a better dependency on 1/\epsilon and 1/(1-\gamma), and thus, is tighter than the best available result for Q-learning. Our bound is also superior to the existing results for both model-free and model-based instances of batch Q-value iteration that are considered to be more efficient than the incremental methods like Q-learning. %Our bound is also superior to the existing results for batch Q-learning, both model-free and model-based, so far considered to be more sample-efficient than the incremental methods like Q-learning.

PDF - Requires Adobe Acrobat Reader or other PDF viewer.
EPrint Type:Conference or Workshop Item (Paper)
Project Keyword:Subjects > COMPLACS
Subjects:Computational, Information-Theoretic Learning with Statistics
Learning/Statistics & Optimisation
Theory & Algorithms
ID Code:8384
Deposited By:Mohammad Azar
Deposited On:02 December 2011