Nonlinear Encoder Models for Information-Theoretic Clustering
Felix Agakov and David Barber
In: Statistics and Optimization of Clustering Workshop, 4-5 Jul 2005, Windsor, UK.
The problem of learning anthropomorphically meaningful cluster allocations has been conventionally addressed by applying probability density estimation techniques or algorithmic spectral clustering methods. Here we propose a simple information-theoretic alternative to the common clustering approaches. The key idea is to maximize the mutual information $I(x,y)$ between the unknown cluster labels $y$ and the training patterns $x$ with respect to parameters of specifically constrained encoding distributions $p(y|x)$. We show that by considering a kernelized representation of $p(y|x)$, we may use the method to learn optimal parameters of the kernel function, which corresponds to principled information theoretic learning of the affinity matrix. The resulting procedure does not require computations of the inverses or eigenvalue decompositions of the Gram matrices, which makes it suitable for clustering large data sets. Additionally, once the encoder model is parameterized, the method does not require complex problem-specific algorithmic heuristics, or heuristic approximations of the objective criteria. Empirically, we demonstrate that the resulting information-theoretic clustering approach favorably compares with the common generative clustering methods.