PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Randomized Algorithms for Fast Bayesian Hierarchical Clustering
Katherine Heller and Zoubin Ghahramani
In: PASCAL Statistics and Optimization of Clustering Workshop, Windsor, UK(2005).

Abstract

We present two new algorithms for fast Bayesian Hierarchical Clustering on large data sets. Bayesian Hierarchical Clustering (BHC) is a method for agglomerative hierarchical clustering based on evaluating marginal likelihoods of a probabilistic model. BHC has several advantages over traditional distance-based agglomerative clustering algorithms. It defines a probabilistic model of the data and uses Bayesian hypothesis testing to decide which merges are advantageous and to output the recommended depth of the tree. Moreover, the algorithm can be interpreted as a novel fast bottom-up approximate inference method for a Dirichlet process (i.e. countably infinite) mixture model (DPM). While the original BHC algorithm has O(n^2) computational complexity, the two new randomized algorithms are O(n log n) and O(n).

PDF - Requires Adobe Acrobat Reader or other PDF viewer.
EPrint Type:Conference or Workshop Item (Paper)
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Computational, Information-Theoretic Learning with Statistics
Learning/Statistics & Optimisation
Information Retrieval & Textual Information Access
ID Code:1161
Deposited By:Katherine Heller
Deposited On:18 November 2005