PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Modeling Sequence Evolution with Kernel Methods
Margherita Bresco, Marco Turchi, Tijl De Bie and Nello Cristianini
Computational Optimization and Applications Volume 38, Number 2, pp. 281-298, 2007.


We model the evolution of biological and linguistic sequences by comparing their statistical properties. This comparison is performed by means of efficiently computable kernel functions, that take two sequences as an input and return a measure of statistical similarity between them. We show how the use of such kernels allows to reconstruct the phylogenetic trees of primates based on the mitochondrial DNA (mtDNA) of existing animals, and the phylogenetic tree of Indo-European and other languages based on sample documents from existing languages. Kernel methods provide a convenient framework for many pattern analysis tasks, and recent advances have been focused on efficient methods for sequence comparison and analysis. While a large toolbox of algorithms has been developed to analyze data by using kernels, in this paper we demonstrate their use in combination with standard phylogenetic reconstruction algorithms and visualization methods.

EPrint Type:Article
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Learning/Statistics & Optimisation
Natural Language Processing
Theory & Algorithms
ID Code:3816
Deposited By:Tijl De Bie
Deposited On:25 February 2008