PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Access to Unlabeled Data can Speed up Prediction Time
Ruth Urner, Shai Ben-David and Shai Shalev-Shwartz
In: ICML 2011, June 2011, Bellevue, Washington, USA.

Abstract

Semi-supervised learning (SSL) addresses the problem of training a classier using a small number of labeled examples and many un- labeled examples. Most previous work on SSL focused on how availability of unlabeled data can improve the accuracy of the learned classiers. In this work we study how un- labeled data can be benecial for construct- ing faster classiers. We propose an SSL al- gorithmic framework which can utilize unla- beled examples for learning classiers from a predened set of fast classiers. We for- mally analyze conditions under which our al- gorithmic paradigm obtains signicant im- provements by the use of unlabeled data. As a side benet of our analysis we propose a novel quantitative measure of the so-called cluster assumption. We demonstrate the po- tential merits of our approach by conducting experiments on the MNIST data set, showing that, when a suciently large unlabeled sam- ple is available, a fast classier can be learned from much fewer labeled examples than with- out such a sample.

PDF - Requires Adobe Acrobat Reader or other PDF viewer.
EPrint Type:Conference or Workshop Item (Paper)
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Learning/Statistics & Optimisation
Theory & Algorithms
ID Code:8912
Deposited By:Shai Shalev-Shwartz
Deposited On:21 February 2012