PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

PALMA: mRNA to Genome Alignments using Large Margin Algorithms Schulze, U, Hepp, B, Ong, C, and Rätsch, G
U Schulze, B Hepp, Cheng Soon Ong and Gunnar Raetsch
Bioinformatics 2007.


Motivation: Despite many years of research on how to properly align sequences in the presence of sequencing errors, alternative splicing and micro-exons, the correct alignment of mRNA sequences to genomic DNA is still a challenging task. Results: We present a novel approach based on large margin learning that combines accurate plice site predictions with common sequence alignment techniques. By solving a convex optimization problem, our algorithm – called PALMA – tunes the parameters of the model such that true alignments score higher than other alignments. We study the accuracy of alignments of mRNAs containing artificially generated micro-exons to genomic DNA. In a carefully designed experiment, we show that our algorithm accurately identifies the intron boundaries as well as boundaries of the optimal local alignment. It outperforms all other methods: for 5702 artificially shortened EST sequences from C. elegans and human it correctly identifies the intron boundaries in all except two cases. The best other method is a recently proposed method called exalin which misaligns 37 of the sequences. Our method also demonstrates robustness to mutations, insertions and deletions, retaining accuracy even at high noise levels. Availability: Datasets for training, evaluation and testing, additional results and a stand-alone alignment tool implemented in C++ and python are available at

EPrint Type:Article
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Learning/Statistics & Optimisation
Multimodal Integration
Theory & Algorithms
ID Code:3033
Deposited By:Gunnar Raetsch
Deposited On:02 September 2007