Interpretable Clustering via Model-Based Divisive Hierarchical Classification
This paper proposes a new clustering algorithm producing interpretable partition. Clustering aims to determine intrinsic structure of a data set. Many algorithms have been developed for clustering in the past, but most of them do not provide interpretable descriptions of the resulting clusters. Monothetic divisive clustering approach has the advantage to give simultaneously a hierarchy and a simple interpretation of its clusters. The clustering method proposed in this paper is based on a model approach: we propose a model organizing the data into a cluster hierarchy. Our method can be explained in two parts. First, the data set is recursively divided into two subsets, maximizing the likelihood. At each step, the split is carried out using a single variable. The second step consists in pruning the tree previously developped. We use a cost-complexity pruning, also known as the CART pruning algorithm. It consists in creating a sequence of nested trees where the first tree is the initial tree and the last tree is the tree composed by only one leaf representing the entire data set. We use the Bayesian information criterion (BIC) to determine the number of clusters and to select a partition. Performances of the algorithm are illustrated using both synthetic and real-life data.