PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Clustering and bifurcations in Gaussian mixture models
Matthew Urry and Peter Sollich
(2009) Technical Report. Matthew Urry.

Abstract

Consider the task of fitting a generic probability distribution p(x) with a mixture of Gaussian components with identical fixed covariance matrices but different means and component weights. Does the optimal solution that minimises the Kullback-Leibler divergence always make use of all available components or can it be clustered in the sense that only some of the components are used? We show that such clustering is generic except in the limit where the precision $\beta$ of the covariance matrix used is infinite, in which case the component means implement a centroidal Voronoi tessellation. For small $\beta$ the optimal solution is fully clustered to only a single component, and we determine the value of $\beta$ at which the first bifurcation to a larger mixture occurs. Numerical results show that increasing $\beta$ causes a cascade of further bifurcations. When p(x) factorises over different dimensions of x, we show that the optimal Gaussian mixture factorises similarly, so that the bifurcation sequences combine. Finally we investigate mixtures where also $\beta$ is optimised, and show that even there clustering can occur for non-trivial target distributions.

Postscript - Requires a viewer, such as GhostView
EPrint Type:Monograph (Technical Report)
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Computational, Information-Theoretic Learning with Statistics
Learning/Statistics & Optimisation
ID Code:6068
Deposited By:Matthew Urry
Deposited On:08 March 2010