Easy learning theory for highly scalable learning algorithms (tutorial)
In: The 22nd International Conference on Machine Learning (ICML 2005), 7-11 August, 2005, Bonn, Germany.
We will present a survey on recent advances in on-line prediction algorithms and their connections to statistical methods in Machine Learning. An on-line learning algorithm is an incremental algorithm that sweeps through a sequence of examples only once. As such, on-line algorithms are both highly adaptive and highly scalable. They seem to be an unavoidable choice when dealing with very large datasets and/or when the datasets are rapidly changing with time. In the worst-case setting of on-line learning, prediction performance is proven via a pointwise analysis, i.e., an analysis that avoids statistical assumptions on the way data are generated. Recent developments in the on-line learning literature show a natural way on turning on-line algorithms working in the worst-case setting into batch algorithms working under more standard stochastic (e.g., i.i.d.) assumptions on the data-generation process. The underlying statistical theory is fairly "easy", while the resulting algorithms are "highly scalable" (hence the title of this tutorial).
The error bounds one can derive from this theory are both data-dependent and algorithm-dependent. Besides, and most importantly, these bounds are trivial to compute and are likely to be sharp (at least as sharp as the existing literature on, say, Support Vector Machines). There are two immediate practical benefits from this theory:
1. It delivers efficient and adaptive learning algorithms achieving the state-of-the-art error bounds obtained by computationally intensive methods
2. it allows sharp and rigorous error reporting.
However, since a broader goal of this tutorial is to bridge Learning Theory and Machine Learning communities, we hope we can also contribute to increasing the cooperation between the two. In this tutorial, we first review some of the most relevant (and recent) on-line algorithms in the worst-case setting, then we discuss how to adapt these algorithms to a more classical statistical learning framework. Some of the results presented are well-known (and have been widely publicized), others are not and, we believe, deserve more attention by machine learning practitioners.