Detecting conserved coding genomic regions through signal processing of nucleotide substitution patterns.
Matteo Re and Giulio Pavesi
Artificial Intelligence in Medicine
Objective: In the last few years several complete genome sequences have been made available to the research community. The annotation of their complete inventory of protein coding genes, however, has been so far an elusive goal. Classical ab initio gene prediction methods have been of great support for this task, but show notable weakness in the prediction of genes with unusual structural features. On the other hand, annotation on the basis of similarity to already known genes in other species does not permit the detection of genuinely novel genes and also introduces a potential source of classification error when based on similarity to sequences erroneously annotated as protein coding. Finally, several methods for the functional classification and assessment of evolutionarily conserved regions have been proposed, but, to our knowledge, signal processing techniques have not been applied yet to this problem, despite their proven usefulness at the single genome level.
Results: In this article we introduce the use of signal processing in comparative genomics and we propose a simple test able to evaluate the coding potential of a pairwise genomic sequence alignment according to the pattern and periodicity with which substitutions and gaps appear in the alignment. We assess the feasibility of our approach on an annotated set of human—mouse genomic alignments.
Conclusion: Results show that the application of signal processing techniques to sequence alignments can be a useful tool for the identification of evolutionarily conserved protein-coding regions.