A probabilistic framework for mismatch and profile string kernels.
A. Vinokourov, A. Soklakov and Craig Saunders
In: European Symposium on Artificial Neural Networks 2005, 27 - 29 April 2005, Bruges, Belgium.
There has recently been numerous applications of kernel methods in
the field of bioinformatics. In particular, the problem of
protein homology has served as a benchmark for the performance of
many new kernels which operate directly on strings (such as
amino-acid sequences). Several new kernels have been developed
and successfully applied to this type of data, including spectrum,
string, mismatch, and profile kernels. In this paper we introduce
a general probabilistic framework for string kernels which uses
the fisher-kernel approach and includes spectrum, mismatch and
profile kernels, among others, as special cases. The use of a
probabilistic model however provides additional flexibility both
in definition and for the re-weighting of features through feature
selection methods, prior knowledge or semi-supervised approaches
which use data repositories such as BLAST. We give details of the
framework and also give preliminary experimental results which
show the applicability of the technique.