Novelty detection: Unlabeled data definitely help
Clayton Scott and Gilles Blanchard
In: AISTATS 2009, 16-18 Apr 2009, Clearwater beach, USA.
In machine learning, one formulation of the novelty detection problem is
to build a detector based on a training sample consisting only on nominal
data. The standard (inductive) approach to this problem has been to
declare novelties where the nominal density is low, which reduces the
problem to density level set estimation. In this paper, we consider the
setting where an unlabeled and possibly contaminated sample is also
available at learning time. We argue that novelty detection is naturally
solved by a general reduction to a binary classification problem. In
particular, a detector with a desired false positive rate can be achieved
through a reduction to Neyman-Pearson classification. Unlike the inductive
approach, our approach yields detectors that are optimal (e.g.,
statistically consistent) regardless of the distribution on novelties.
Therefore, in novelty detection, unlabeled data have a substantial impact
on the theoretical properties of the decision rule.