PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Worst-case analysis of selective sampling for linear-threshold algorithms
Nicolò Cesa-Bianchi, Claudio Gentile and Luca Zaniboni
Journal of Machine Learning Research Volume 7, pp. 1205-1230, 2006.


A selective sampling algorithm is a learning algorithm for classification that, based on the past observed data, decides whether to ask the label of each new instance to be classified. In this paper, we introduce a general technique for turning linear-threshold classification algorithms from the general additive family into randomized selective sampling algorithms. For the most popular algorithms in this family we derive mistake bounds that hold for individual sequences of examples. These bounds show that our semi-supervised algorithms can achieve, on average, the same accuracy as that of their fully supervised counterparts, but using fewer labels. Our theoretical results are corroborated by a number of experiments on real-world textual data. The outcome of these experiments is essentially predicted by our theoretical results: Our selective sampling algorithms tend to perform as well as the algorithms receiving the true label after each classification, while observing in practice substantially fewer labels.

PDF - Requires Adobe Acrobat Reader or other PDF viewer.
EPrint Type:Article
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Computational, Information-Theoretic Learning with Statistics
Learning/Statistics & Optimisation
Theory & Algorithms
Information Retrieval & Textual Information Access
ID Code:265
Deposited By:Claudio Gentile
Deposited On:23 November 2004