PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Computation of Similarity Measures for Sequential Data using Generalized Suffix Trees
Konrad Rieck, Pavel Laskov and Sören Sonnenburg
In: NIPS 2006, 4-8 Dec 2006, Vancouver, Canada.


We propose a generic algorithm for computation of similarity measures for sequential data. The algorithm uses generalized suffix trees for efficient calculation of various kernel, distance and non-metric similarity functions. Its worst-case run-time is linear in the length of sequences and independent of the underlying embedding language, which can cover words, k-grams or all contained subsequences. Experiments with network intrusion detection, DNA analysis and text processing applications demonstrate the utility of distances and similarity coefficients for sequences as alternatives to classical kernel functions.

PDF - Requires Adobe Acrobat Reader or other PDF viewer.
EPrint Type:Conference or Workshop Item (Paper)
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Information Retrieval & Textual Information Access
ID Code:2787
Deposited By:Pavel Laskov
Deposited On:22 November 2006