## AbstractConsider the task of fitting a generic probability distribution p(x) with a mixture of Gaussian components with identical fixed covariance matrices but different means and component weights. Does the optimal solution that minimises the Kullback-Leibler divergence always make use of all available components or can it be clustered in the sense that only some of the components are used? We show that such clustering is generic except in the limit where the precision $\beta$ of the covariance matrix used is infinite, in which case the component means implement a centroidal Voronoi tessellation. For small $\beta$ the optimal solution is fully clustered to only a single component, and we determine the value of $\beta$ at which the first bifurcation to a larger mixture occurs. Numerical results show that increasing $\beta$ causes a cascade of further bifurcations. When p(x) factorises over different dimensions of x, we show that the optimal Gaussian mixture factorises similarly, so that the bifurcation sequences combine. Finally we investigate mixtures where also $\beta$ is optimised, and show that even there clustering can occur for non-trivial target distributions.
[Edit] |