Efficient tracking of the best of many experts
In the framework for prediction of individual sequences, sequential prediction methods are to be constructed that perform (asymptotically) as well as the best expert from a given class. We consider the powerful class of switching strategies that can segment the given sequence into several blocks, and follow the advice of different "base" experts in each block. Our goal is to provide efficient prediction strategies even in the case when the set of base experts is large. Earlier work on this problem resulted in algorithms whose cumulative regret on a sequence of length n (with respect to the class of switching predictors) is of the order of log(n) for each switch of the best "meta"-expert, for large base expert classes of size polynomial in n (and even for some special infinite expert classes) and a large class of loss functions, including, e.g., exp-concave losses. However, these algorithms either have complexity that is linear in time and the number of base experts and thus cannot handle large base expert classes, or they admit efficient implementations for large expert classes at the price of quadratic complexity in n. In this work we give an algorithm that aims to unify the advantages of the above two approaches. This new algorithm has time complexity proportional to n log(n) and cumulative regret on the order of log^2(n) per switch, and is particularly suitable for large base expert classes. The new method generalizes a low-complexity algorithm by Willems for stochastic sources and the log loss. Among other applications, our method can be used to improve the performance-complexity trade-off in sequential lossless or limited-delay lossy source coding, and in sequential routing algorithms.