Biological specifications for a synthetic gene expression data generation model
Francesca Ruffino, Marco Muselli and Giorgio Valentini
Computational Intelligence Methods for Bioinformatics and Biostatistics
Lecture Notes on Computer Science
An open problem in gene expression data analysis is the evaluation of the performance of gene selection methods applied to discover biologically
relevant sets of genes. The problem is difficult, as the entire set of genes involved in specific biological processes is usually unknown
or only partially known, making unfeasible a correct comparison between different gene selection methods.
The natural solution to this problem consists in developing an artificial model to generate gene expression data, in order to know in
advance the set of biologically relevant genes.
The models proposed in the literature, even if useful for a preliminary evaluation of gene selection methods,
did not explicitly consider the biological characteristics of gene expression data.
The main aim of this work is to individuate the main biological characteristics that need to be considered to design a model for
validating gene selection methods based on the analysis of DNA microarray data.