PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Sequential information bottleneck for finite data
Jaakko Peltonen, Janne Sinkkonen and Samuel Kaski
In: ICML 2004, 4-8 Jul 2004, Banff, Canada.

Abstract

The sequential information bottleneck (sIB) algorithm clusters co-occurrence data such as text documents vs. words. We introduce a variant that models sparse co-occurrence data by a generative process. This turns the objective function of sIB, mutual information, into a Bayes factor, while keeping it intact asymptotically, for non-sparse data. Experimental performance of the new algorithm is comparable to the original sIB for large data sets, and better for smaller, sparse sets.

EPrint Type:Conference or Workshop Item (Paper)
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Computational, Information-Theoretic Learning with Statistics
Theory & Algorithms
Information Retrieval & Textual Information Access
ID Code:87
Deposited By:Jaakko Peltonen
Deposited On:14 May 2004