PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Change Point Detection and meta-bandits for online learning in dynamic environments
Cedric Hartland, Nicolas Baskiotis, Sylvain Gelly, Olivier Teytaud and Michele Sebag
CAp 07 2007.

Abstract

Motivated by real-time website optimization, this paper is about online learning in abruptly changing environments considering the dynamic multiarmed bandits problem. This problem shows the importance of the Exploration vs Exploitation trade-of between, on the one hand, maximizing a reward based on our actual knowledge over past tries and, on the other hand to find the actual best reward at a given time, considering dynamic changes with time. Two extensions of the UCBT algorithm are combined in order to handle the dynamic multi-armed bandits, and specifically to cope with fast variations in the rewards. Firstly, a change point detection test based on Page-Hinkley statistics is used to overcome the limitations due to the UCBT inertia in front of some changes. Secondly, a controlled forgetting strategy dubbed Meta-Bandit is proposed to take care of the Exploration vs Exploitation trade-off when the PH test is triggered. Extensive empirical validation shows significant improvements compared to the baseline algorithms over a Pascal challenge Benchmark proposed by touch clarity. 1 Introd

EPrint Type:Article
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Learning/Statistics & Optimisation
ID Code:3678
Deposited By:Cedric Hartland
Deposited On:14 February 2008