Combining Strategies Efficiently: High-Quality Decisions from Conflicting Advice
In this dissertation we study machine learning: the automated discovery and exploitation of regularities in data. We may use regularities iden- tiﬁed in objects to explain the past (e.g. archaeology, justice), as well as regularities found in processes to predict the future (e.g. weather, stock market) and guide our actions. With ubiquitous computational resources, machine learning algo- rithms have become pervasive. For example, they manage ﬁnancial portfolios and power-saving policy, provide personalised movie recom- mendations as well as advertisements, and form the core of state-of- the-art data compression software. This dissertation develops the theory of online learning, a branch of machine learning that investigates sequential decision problems with immediate feedback. In particular, we study the setting called predic- tion with expert advice. Our task is to predict a sequence of data. Each trial, we may ﬁrst consult a given set of experts. We then combine their advice and issue our prediction of the next outcome. Finally, the next outcome is revealed, and we incur loss based on the discrepancy between our prediction and it. The goal is to build efﬁcient algorithms with small regret, i.e. the difference between the incurred cumulative loss and the loss of the best strategy in hindsight from a ﬁxed reference class. In this sense, the strategies in the reference class are the patterns, and achieving small regret means learning which reference strategy best models the data. The main difference between the learning problems we consider is the complexity of the reference set. Algorithms for prediction with ex- pert advice have many applications including classiﬁcation, regression, hypothesis testing, model selection, data compression, gambling and investing in the stock market. In Chapter 2 we give a game-theoretic analysis of the simplest on- line learning problem, the prediction of a sequence of binary outcomes under 0/1 loss with the help of two experts. For this simple problem, we compute the minimax, i.e. game-theoretically optimal, regret, and show how to implement the optimal strategy efﬁciently. We then give special attention to the case that one of the experts is good. We con- clude with a new result: the optimal algorithm for competing with the set of meta-experts that switch between the two basic experts. In Chapter 3 we show how models for prediction with expert ad- vice can be deﬁned concisely and clearly using hidden Markov models (HMMs); standard algorithms can then be used to efﬁciently calculate how the expert predictions should be weighted. We focus on algorithms for tracking the best expert. Here the strategies in the reference set follow the advice of a single expert, but this expert may change between trials. We cast existing models as HMMs, starting from the ﬁxed share algo- rithm, recover the running times and regret bounds for each algorithm, and discuss how they are related. We also describe three new models for switching between experts. In Chapter 4 we extend the setting to tracking the best learning ex- pert. Whereas vanilla experts can be tapped for advice about the cur- rent trial, learning experts may be queried for advice given each possi- ble subset of the past data. This additional power is available to both the algorithm and the reference strategies. Achieving small regret thus means learning how to partition the trials, and which learning expert to train and follow within each partition cell. We give efﬁcient algorithms with small regret for tracking learning experts that can themselves be formalised using the expert HMMs of Chapter 3. In Chapter 5 we consider reference strategies that switch between two experts based on their cumulative loss instead of on time. This chap- ter is formulated in ﬁnancial terms to make the presentation more in- tuitive. We present a simple online two-way trading algorithm that exploits ﬂuctuations in the unit price of an asset. Rather than analysing worst-case performance under some assumptions, we prove a novel, unconditional performance bound that is parameterised either by the actual dynamics of the price of the asset, or by a simplifying model thereof. We discuss application of the results to prediction with expert advice, data compression and hypothesis testing. In Chapter 6 we consider prediction with structured concepts. Each round we select a concept, which is composed of components. The loss of a concept is the sum of the losses of its components. Whereas the losses of different components are independent, the losses of different concepts are highly related. We develop an online algorithm, called Component Hedge that exploits this dependence, and thereby avoids the so called range factor that arises when the dependences are ignored. We show that Component Hedge has optimal regret bounds for a large variety of structured concept classes.