A sequential Monte Carlo algorithm for coalescent clustering.
Algorithms for automatically discovering hierarchical structure from data play an important role in machine learning. Teh et al. (2008) proposed a Bayesian hierarchical clustering model based on Kingman’s coalescent (Kingman, 1982b) and proposed both greedy and sequential Monte Carlo (SMC) based agglomerative clustering algorithms for inference, the SMC algorithm having computational cost O(n3) per particle, where n is the number of data items. We build upon this work and propose a new SMC based algorithm for inference in the coalescent clustering model. Our algorithm is based upon a different perspective on Kingman’s coalescent than that in (Teh et al., 2008), where the computations required to consider merging each pair of clusters at each iteration is not discarded in subsequent iterations. This improves the computational cost to O(n2) per particle. In experiments we show that our new algorithm achieves improved costs without sacrificing accuracy or reliability.