PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Accurate Splice Site Detection for Caenorhabditis elegans
Gunnar Raetsch and Sören Sonnenburg
In: Kernel Methods in Computational Biology Computational Molecular Biology . (2004) MIT Press , London, England , pp. 277-298. ISBN 0262195097

Abstract

We propose a new system for predicting the splice form of Caenorhabditis elegans genes. As a first step we generate a clean set of genes from available exressed sequence tags (EST) and complete complementary (cDNA) sequences. From all such genes we then generate potential acceptor and donor sites as they would be required by any gene finder. This leads to a clean set of true and decoy splice sites. In a second step we use support vector machines (SVMs) with appropriately designed kernels to learn to distinguish between true and decoy sites. Using the newly generated data and the novel kernels we could considerably improve our previous results on the same task. In the last step we design and test a new splice finder system that combines the SVM predictions with additional statistical information about splicing. Using this system we are able to predict the exon-intron structure of a given gene with known translation initiation and stop codon site. The system has been tested successfully on a newly generated set of genes and compared with GenScan.We found that our system predicts the correct splice form for more than 92% of these genes, whereas GenScan only achieves 77.5% accuracy.

EPrint Type:Book Section
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Learning/Statistics & Optimisation
Theory & Algorithms
ID Code:832
Deposited By:Gunnar Rätsch
Deposited On:01 January 2005