PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Associative clustering for exploring dependencies between functional genomics data sets
Samuel Kaski, Janne Nikkilä, Janne Sinkkonen, Leo Lahti, Juha Knuutila and Christophe Roos
IEEE/ACM Transactions on Computational Biology and Bioinformatics Volume 2, pp. 203-216, 2005.


High-throughput genomic measurements, interpreted as co-occurring data samples from multiple sources, open up a fresh problem for machine learning: What is in common in the different data sets, that is, what kind of statistical dependencies there are between the paired samples from the different sets. We introduce a clustering algorithm for exploring the dependencies. Samples within each data set are grouped such that the dependencies between groups of different sets capture as much of pairwise dependencies between the samples as possible. We formalize this problem in a novel probabilistic way, as optimization of a Bayes factor. The method is applied to reveal commonalities and exceptions in the expression of organisms, and to suggest regulatory interactions, in the form of dependencies between gene expression profiles and regulator binding patterns.

EPrint Type:Article
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Computational, Information-Theoretic Learning with Statistics
Theory & Algorithms
ID Code:1693
Deposited By:Samuel Kaski
Deposited On:28 November 2005