PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Consistent Minimization of Clustering Objective Functions
Ulrike v. Luxburg, Sebastien Bubeck, Stefanie Jegelka and Michael Kaufmann
In: NIPS 2007, Vancouver, Canada(2008).

Abstract

Clustering is often formulated as a discrete optimization problem. The objective is to find, among all partitions of the data set, the best one according to some quality measure. However, in the statistical setting where we assume that the finite data set has been sampled from some underlying space, the goal is not to find the best partition of the given sample, but to approximate the true partition of the under- lying space. We argue that the discrete optimization approach usually does not achieve this goal. As an alternative, we suggest the paradigm of “nearest neighbor clustering”. Instead of selecting the best out of all partitions of the sample, it only considers partitions in some restricted function class. Using tools from statistical learning theory we prove that nearest neighbor clustering is statistically consis- tent. Moreover, its worst case complexity is polynomial by construction, and it can be implemented with small average case complexity using branch and bound.

EPrint Type:Conference or Workshop Item (Paper)
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Theory & Algorithms
ID Code:3905
Deposited By:Ulrike Von Luxburg
Deposited On:25 February 2008