Semi-Supervised Model Selection Based on Cross-Validation
International Computer Science Institute, Berkeley, USA.
We propose a new semi-supervised model selection method that is
derived by applying the structural risk minimization principle to a
recent semi-supervised generalization error bound. This bound that
we build on is based on the cross-validation estimate underlying the
popular cross-validation model selection heuristic. Thus, the proposed
semi-supervised method is closely connected to cross-validation which
makes studying these methods side by side very natural.
We evaluate the performance of the proposed method and the cross-validation
heuristic empirically on the task of selecting the parameters of support
vector machines. The experiments indicate that the models selected by the two
methods have roughly the same accuracy. However, whereas the
cross-validation heuristic only proposes which classifier to choose, the
semi-supervised method provides also a reliable and reasonably tight
generalization error guarantee for the chosen classifier.
Thus, when unlabeled data is available, the proposed semi-supervised
method seems to have an advantage when reliable error guarantees
are called for. In addition to the empirical evaluation, we also analyze
the theoretical properties of the proposed method and prove that under
suitable conditions it converges to the optimal model.