Evaluation of gene selection methods through artificial and real-world data concerning DNA microarray experiments
Francesca Ruffino, Giorgio Valentini and Marco Muselli
In: BITS 2005, 17-19 Mar 2005, Milano, Italy.
Several statistic and machine learning techniques have been proposed in literature for gene selection in the context of DNA microarray analysis.
If we want to evaluate in an objective way the quality of a gene selection method, such as GOLUB, SVM-RFE, or SNN-RFA, we cannot adopt real data, since we do not know the collection of genes actually involved in the underlying biological process.
In this work we apply TAGGED (Technique for Artificial Generation of Gene Expression Data) to generate artificial datasets that presents a similar statistic behavior as data deriving from DNA microarray
To evaluate the results obtained by GOLUB, SVM-RFE and SNN-RFA gene selection methods we applied them to both artificial gene expression data generated by TAGGED, and real gene expression data.
The results obtained for the three datasets show that a good agreement exists between performances on real and artificial data (this points out the validity of TAGGED). Furthermore, GOLUB and SNN-RFA achieve excellent results on artificial data, where the set of relevant genes is known.