Predicting gene expression from heterogeneous data
Matteo Re and Giorgio Valentini
In: CIBB 2009, The Sixth International Conference on Bioinformatics and Biostatistics, 15-17 Oct 2009, Genova, Italy.
The complexity of gene expression and the elucidation of the mechanisms
involved in its regulation constitute an extremely difficult challenge in modern bioinformatics despite the amount of information made recently available by high-throughput biotechnologies and genome-wide investigations.
In this contribution we investigated the effectiveness of ensemble systems for gene expression prediction. The ability of ensemble systems to integrate heterogeneous datasets allows to exploit not only promoter sequence-based datasets, but also other sources of information, such as phylogenetic patterns of regulatory motifs and covalent histone modifications. To this end we collected data from literature, and we predicted the expression class of 2490 S.Cerevisiae genes using an ensemble of Support Vector Machines trained with 4 different sources of data. The experimental results highlighted that improvement in gene expression prediction performances can be obtained by using ensemble systems. Nevertheless, further investigations are required in order to find the best combination of datasets and data fusion methods for gene-expression class prediction