PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

A sequential Monte Carlo algorithm for coalescent clustering.
Dilan Gorur and Yee Whye Teh
In: Nonparametric Bayes Workshop UAI/ICML 2008, Finland(2008).


Algorithms for automatically discovering hierarchical structure from data play an important role in machine learning. Teh et al. (2008) proposed a Bayesian hierarchical clustering model based on Kingman’s coalescent (Kingman, 1982b) and proposed both greedy and sequential Monte Carlo (SMC) based agglomerative clustering algorithms for inference, the SMC algorithm having computational cost O(n3) per particle, where n is the number of data items. We build upon this work and propose a new SMC based algorithm for inference in the coalescent clustering model. Our algorithm is based upon a different perspective on Kingman’s coalescent than that in (Teh et al., 2008), where the computations required to consider merging each pair of clusters at each iteration is not discarded in subsequent iterations. This improves the computational cost to O(n2) per particle. In experiments we show that our new algorithm achieves improved costs without sacrificing accuracy or reliability.

EPrint Type:Conference or Workshop Item (Poster)
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Learning/Statistics & Optimisation
Theory & Algorithms
ID Code:5350
Deposited By:Dilan Gorur
Deposited On:24 March 2009