Temporal Difference Bayesian Model Averaging: A Bayesian Perspective on Adapting Lambda
C Downey and Scott Sanner
In: Proceedings of the 27th International Conference on Machine Learning (ICML-10)(2010).
Temporal diﬀerence (TD) algorithms are attractive for reinforcement learning due to their ease-of-implementation and use of “bootstrapped” return estimates to make efﬁcient use of sampled data. In particular, TD(λ) methods comprise a family of reinforcement learning algorithms that often yield fast convergence by averaging multiple estimators of the expected return. However, TD(λ) chooses a very speciﬁc way of averaging these estimators based on the ﬁxed parameter λ, which may not lead to optimal convergence rates in all settings. In this paper, we derive an automated Bayesian approach to setting λ that we call temporal diﬀerence Bayesian model averaging (TDBMA). Empirically, TD-BMA always performs as well and often much better than the best ﬁxed λ for TD(λ) (even when performance for diﬀerent values of λ varies across problems) without requiring that λ or any analogous parameter be manually tuned.