PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Information Bottleneck for Non Co-Occurrence Data
Yevgeny Seldin, Noam Slonim and Naftali Tishby
In: NIPS, Vancouver, Canada(2007).


We present a general model-independent approach to the analysis of data in cases when these data do not appear in the form of co-occurrence of two variablesX; Y , but rather as a sample of values of an unknown (stochastic) function Z(X; Y ). For example, in gene expression data, the expression level Z is a function of gene X and condition Y ; or in movie ratings data the rating Z is a function of viewer X and movie Y . The approach represents a consistent extension of the Information Bottleneck method that has previously relied on the availability of co-occurrence statistics. By altering the relevance variable we eliminate the need in the sample of joint distribution of all input variables. This new formulation also enables simple MDL-like model complexity control and prediction of missing values of Z. The approach is analyzed and shown to be on a par with the best known clustering algorithms for a wide range of domains. For the prediction of missing values (collaborative filtering) it improves the currently best known results.

PDF - Requires Adobe Acrobat Reader or other PDF viewer.
EPrint Type:Conference or Workshop Item (Paper)
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Computational, Information-Theoretic Learning with Statistics
Learning/Statistics & Optimisation
Theory & Algorithms
ID Code:4131
Deposited By:Yevgeny Seldin
Deposited On:30 May 2008