Relational learning of biological networks
Cyril Combe, Vincent Schächter, Stan Matwin and Florence d'Alché-Buc
In: First FEBS Advanced Course on Systems Biology, 12-18 Mar 2005, Gosau, Austria.
The last few years have seen a lot of different approaches for the reconstruction of biological networks. Those approaches essentially differ about the kind of data used and about the models of biological networks considered. The data used can be either numerical like time series generated by micro array experiments, or symbolic, like annotation databases or ontologies extracted from scientific articles by text mining algorithms. The models considered usually have graph structures and involve different kind of objects, like genes, proteins, metabolites or reactions. In order to deal with heterogeneous data as for representing highly relational models, it appears appropriate to infer biological networks with relational learning techniques.
As a proof of concept, we used the Inductive Logic Programming  system Progol to learn the concept of gene regulation based on gene expression data. The approach is the following:
-We consider a known gene network with associated expression data and we represent both in first order logic
-We learn a first order logic definition of gene regulation with Progol
-We try to discover potential regulations with the definition of regulation outputted by Progol
In order to represent expression data in a compact way, we discretized it in terms of expression levels, of variation directions and of time. The time discretization uses the notion of time intervals. We empowered the system with predicates able to capture relations between intervals, inspired by a formalism introduced by Allen in , which constitutes a new approach to deal with time series in ILP.
The system has been successfully tested on artificial datasets generated by different kinds of dynamic systems. It has also been tested on real datasets, one related to the SOS DNA Repair network of E. Coli and one related to the cell cycle of the Yeast.
Current work follows three complementary directions:
-We try to use other sources of information like the Gene Ontology database or metabolic datasets
-We are working on new models combining gene regulatory networks and metabolic pathways
-We are currently investigating methods combining first order logic and probabilities (see )
 Muggleton, S., et al., Inductive Logic Programming: Theory and Methods (1994), JLP, 19-20, 629
 Allen, J., et al., Actions and Events in Interval Temporal Logic (1994), Rochester TechReport, 521, 1
 De Raedt, L., et al., Probabilistic Inductive Logic Programming (2004), ALT, 15, 19