PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Discriminative Clustering
Samuel Kaski, Janne Sinkkonen and Arto Klami
Neurocomputing 2004.

Abstract

A distributional clustering model for continuous data is reviewed and new methods for optimizing and regularizing it are introduced and compared. Based on samples of discrete-valued auxiliary data associated to samples of the continuous primary data, the continuous data space is partitioned into Voronoi regions that are maximally homogeneous in terms of the discrete data. Then only variation in the primary data associated to variation in the discrete data affects the clustering; the discrete data "supervises" the clustering. Because the whole continuous space is partitioned, new samples can be easily clustered by the continuous part of the data alone. In experiments, the approach is shown to produce more homogeneous clusters than alternative methods. Two regularization methods are demonstrated to further improve the results: an entropy-type penalty for unequal cluster sizes, and the inclusion of a model for the marginal density of the primary data. The latter is also interpretable as special kind of joint distribution modeling with tunable emphasis for discrimination and the marginal density.

Postscript - Archive staff only - Requires a viewer, such as GhostView
EPrint Type:Article
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Computational, Information-Theoretic Learning with Statistics
Learning/Statistics & Optimisation
Theory & Algorithms
ID Code:452
Deposited By:Arto Klami
Deposited On:23 December 2004