MMG: a probabilistic tool to identify submodules of metabolic pathways
Guido Sanguinetti, Josselin Noirel and Phillip C. Wright
Motivation: A fundamental task in systems biology is the identification of groups of genes that are involved in the cellular response to particular signals. At its simplest level, this often reduces to identifying biological quantities (mRNA abundance, enzyme concentrations, etc.) which are differentially expressed in two different conditions. Popular approaches involve using t-test statistics, based on modelling the data as arising from a mixture distribution. A common assumption of these approaches is that the data are independent and identically distributed; however, biological quantities are usually related through a complex (weighted) network of interactions, and often the more pertinent question is which subnetworks are differentially expressed, rather than which genes. Furthermore, in many interesting cases (such as high-throughput proteomics and metabolomics), only very partial observations are available, resulting in the need for efficient imputation techniques.
Results: We introduce Mixture Model on Graphs (MMG), a novel probabilistic model to identify differentially expressed submodules of biological networks and pathways. The method can easily incorporate information about weights in the network, is robust against missing data and can be easily generalized to directed networks. We propose an efficient sampling strategy to infer posterior probabilities of differential expression, as well as posterior probabilities over the model parameters. We assess our method on artificial data demonstrating significant improvements over standard mixture model clustering. Analysis of our model results on quantitative high-throughput proteomic data leads to the identification of biologically significant subnetworks, as well as the prediction of the expression level of a number of enzymes, some of which are then verified experimentally.