PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Semi-Supervised Model Selection Based on Cross-Validation
Matti Kääriäinen
(2005) Technical Report. International Computer Science Institute, Berkeley, USA.


We propose a new semi-supervised model selection method that is derived by applying the structural risk minimization principle to a recent semi-supervised generalization error bound. This bound that we build on is based on the cross-validation estimate underlying the popular cross-validation model selection heuristic. Thus, the proposed semi-supervised method is closely connected to cross-validation which makes studying these methods side by side very natural. We evaluate the performance of the proposed method and the cross-validation heuristic empirically on the task of selecting the parameters of support vector machines. The experiments indicate that the models selected by the two methods have roughly the same accuracy. However, whereas the cross-validation heuristic only proposes which classifier to choose, the semi-supervised method provides also a reliable and reasonably tight generalization error guarantee for the chosen classifier. Thus, when unlabeled data is available, the proposed semi-supervised method seems to have an advantage when reliable error guarantees are called for. In addition to the empirical evaluation, we also analyze the theoretical properties of the proposed method and prove that under suitable conditions it converges to the optimal model.

PDF - Requires Adobe Acrobat Reader or other PDF viewer.
EPrint Type:Monograph (Technical Report)
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Computational, Information-Theoretic Learning with Statistics
Theory & Algorithms
ID Code:1133
Deposited By:Matti Kääriäinen
Deposited On:20 October 2005