|
Efficient Computation of Gap-weighted String Kernels on Large Alphabets AbstractWe present a sparse dynamic programming algorithm that, given two strings s, t, a gap penalty lambda, and an integer p, computes the value of the gapweighted length-p subsequences kernel. The algorithm works in time O(p|M| log min(|s|, |t|)), where M = {(i, j)|s_i = t_j } is the set of matches of characters in the two sequences. The new algorithm is empirically evaluated against a full dynamic programming approach and a triebased algorithm on synthetic data. Based on the experiments, the full dynamic programming approach is the fastest on short strings, and on long strings if the alphabet is small. On large alphabets, the new sparse dynamic programming algorithm is the most efficient. On mediumsized alphabets the triebased approach is best if the maximum number of allowed gaps is strongly restricted.
[Edit] |