PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Multi-Way Distributional Clustering via Pairwise Interactions
Ron Bekkerman, Ran El-Yaniv and Andrew McCallum
In: ICML 2005, 7-11 August, Bonn, Germany.


We present a novel unsupervised learning scheme that simultaneously clusters variables of several types (e.g., documents, words and authors) based on pairwise interactions between the types, as observed in co-occurrence data. In this scheme, multiple clustering systems are generated aiming at maximizing an objective function that measures multiple pairwise mutual information between cluster variables. To implement this idea, we propose an algorithm that interleaves top-down clustering of some variables and bottom-up clustering of the other variables, with a local optimization correction routine. Focusing on document clustering we present an extensive empirical study of two-way, three-way and four-way applications of our scheme using six real-world datasets including the 20 Newsgroups (20NG) and the Enron email collection. Our multi-way distributional clustering (MDC) algorithms consistently and significantly outperform previous state-of-the-art information theoretic clustering algorithms.

PDF - Requires Adobe Acrobat Reader or other PDF viewer.
EPrint Type:Conference or Workshop Item (Paper)
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Computational, Information-Theoretic Learning with Statistics
Multimodal Integration
Information Retrieval & Textual Information Access
ID Code:1568
Deposited By:Ran El-Yaniv
Deposited On:28 November 2005