Clustering of contingency table and mixture model
Gérard Govaert and Mohamed Nadif
Basing cluster analysis on mixture models has become a classical and powerful approach. It enables some classical criteria such as the well-known k-means criterion to be explained. To classify the rows or the columns of a contingency table, an adapted version of k-means known as Mndki2, which uses the chi-square distance, can be used. Unfortunately, this simple, effective method which can be used jointly with correspondence analysis based on the same representation of the data, cannot be associated with a mixture model in the same way as the classical k-means algorithm. In this paper we show that the Mndki2 algorithm can be viewed as an approximation of a classifying version of the EM algorithm for a mixture of multinomial distributions. A comparison of the algorithms belonging in this context are experimentally investigated using Monte Carlo simulations.