PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

A probabilistic framework for mismatch and profile string kernels.
A. Vinokourov, A. Soklakov and Craig Saunders
In: European Symposium on Artificial Neural Networks 2005, 27 - 29 April 2005, Bruges, Belgium.


There has recently been numerous applications of kernel methods in the field of bioinformatics. In particular, the problem of protein homology has served as a benchmark for the performance of many new kernels which operate directly on strings (such as amino-acid sequences). Several new kernels have been developed and successfully applied to this type of data, including spectrum, string, mismatch, and profile kernels. In this paper we introduce a general probabilistic framework for string kernels which uses the fisher-kernel approach and includes spectrum, mismatch and profile kernels, among others, as special cases. The use of a probabilistic model however provides additional flexibility both in definition and for the re-weighting of features through feature selection methods, prior knowledge or semi-supervised approaches which use data repositories such as BLAST. We give details of the framework and also give preliminary experimental results which show the applicability of the technique.

PDF - PASCAL Members only - Requires Adobe Acrobat Reader or other PDF viewer.
EPrint Type:Conference or Workshop Item (Paper)
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Learning/Statistics & Optimisation
Theory & Algorithms
ID Code:1760
Deposited By:Craig Saunders
Deposited On:28 November 2005