Partitioning of Image Datasets using Discriminative Context Information
In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 24-26 Jun 2008, Anchorage, USA.
We propose a new method to partition an unlabeled
dataset, called Discriminative Context Partitioning (DCP).
It is motivated by the idea of splitting the dataset based only
on how well the resulting parts can be separated from a
context class of disjoint data points. This is in contrast to
typical clustering techniques like K-means that are based
on a generative model by implicitly or explicitly searching
for modes in the distribution of samples.
The discriminative criterion in DCP avoids the problems
that density based methods have when the a priori
assumption of multimodality is violated, when the number
of samples becomes small in relation to the dimensional-
ity of the feature space, or if the cluster sizes are strongly
unbalanced. We formulate DCP's separation property as
a large-margin criterion, and show how the resulting
optimization problem can be solved efficiently. Experiments
on the MNIST and USPS datasets of handwritten digits and on
a subset of the Caltech256 dataset show that, given a suitable
context, DCP can achieve good results even in situation
where density-based clustering techniques fail.