PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Interpretable Clustering via Model-Based Divisive Hierarchical Classification
Nadege Sujka, Gérard Govaert and Christophe Ambroise
In: 29th Annual GFKL (Gesellschaft für Klassifikation), 9-11 March 2005, Magdeburg, Germany.

Abstract

This paper proposes a new clustering algorithm producing interpretable partition. Clustering aims to determine intrinsic structure of a data set. Many algorithms have been developed for clustering in the past, but most of them do not provide interpretable descriptions of the resulting clusters. Monothetic divisive clustering approach has the advantage to give simultaneously a hierarchy and a simple interpretation of its clusters. The clustering method proposed in this paper is based on a model approach: we propose a model organizing the data into a cluster hierarchy. Our method can be explained in two parts. First, the data set is recursively divided into two subsets, maximizing the likelihood. At each step, the split is carried out using a single variable. The second step consists in pruning the tree previously developped. We use a cost-complexity pruning, also known as the CART pruning algorithm. It consists in creating a sequence of nested trees where the first tree is the initial tree and the last tree is the tree composed by only one leaf representing the entire data set. We use the Bayesian information criterion (BIC) to determine the number of clusters and to select a partition. Performances of the algorithm are illustrated using both synthetic and real-life data.

EPrint Type:Conference or Workshop Item (Invited Talk)
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Computational, Information-Theoretic Learning with Statistics
Learning/Statistics & Optimisation
ID Code:1955
Deposited By:Gérard Govaert
Deposited On:30 December 2005