PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Infinite Factorization of Multiple Non-parametric Views
Simon Rogers, Arto Klami, Janne Sinkkonen, Mark Girolami and Samuel Kaski
Machine Learning 2009. ISSN 1573-0565

Abstract

Combinedanalysisofmultipledatasourceshasincreasingapplicationinterest,in particular for distinguishing shared and source-specific aspects. We extend this rationale to the generative and non-parametric clustering setting by introducing a novel non-parametric hierarchical mixture model. The lower level of the model describes each source with a flex- ible non-parametric mixture, and the top level combines these to describe commonalities of the sources. The lower-level clusters arise from hierarchical Dirichlet Processes, inducing an infinite-dimensional contingency table between the sources. The commonalities between the sources are modeled by an infinite component model of the contingency table, interpretable as non-negative factorization of infinite matrices, or as a prior for infinite contingency tables. With Gaussian mixture components plugged in for continuous measurements, the model is applied to two views of genes, mRNA expression and abundance of the produced proteins, to expose groups of genes that are co-regulated in either or both of the views. We discover complex relationships between the marginals (that are multimodal in both marginals) that would remain undetected by simpler models. Cluster analysis of co-expression is a standard method of screening for co-regulation, and the two-view analysis extends the approach to distinguishing between pre- and post-translational regulation.

EPrint Type:Article
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Learning/Statistics & Optimisation
Multimodal Integration
Theory & Algorithms
ID Code:6344
Deposited By:Simon Rogers
Deposited On:08 March 2010