PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Generalization to Unseen Cases
Teemu Roos, Peter Grünwald, Petri Myllymäki and Henry Tirri
In: NIPS 2005, 5-11 Dec 2005, Vancouver, Canada.

Abstract

We analyse classification error on unseen cases, i.e. cases that are different from those in the training set. Unlike standard generalization error, this {\em off-training set error\/} may differ significantly from the empirical error with high probability even with large sample sizes. We derive a data-dependent bound on the difference between off-training set and standard generalization error. Our result is based on a new bound on the missing mass, which for small samples is stronger than existing bounds based on Good-Turing estimation. As we demonstrate on the UCI data sets, our bound gives nontrivial generalization guarantees in many practical cases. In light of these results, we show that certain claims made in the No Free Lunch literature are overly pessimistic.

PDF - Requires Adobe Acrobat Reader or other PDF viewer.
EPrint Type:Conference or Workshop Item (Poster)
Additional Information:A version of this paper won the BNAIC-2005 Best paper award
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Computational, Information-Theoretic Learning with Statistics
Learning/Statistics & Optimisation
Natural Language Processing
ID Code:1289
Deposited By:Peter Grünwald
Deposited On:28 November 2005