Approximate Kernels for Trees
Konrad Rieck, Tammo Krueger and Ulf Brefeld
FIRST Reports 5/2008, Fraunhofer Institute FIRST
Convolution kernels for trees provide effective means for learning with tree-structured data, such as parse trees of natural language sentences. Unfortunately,
the computation time of tree kernels is quadratic in the size of the trees as all pairs
of nodes need to be compared: large trees render convolution kernels inapplicable. In this paper, we propose a simple but efficient approximation technique
for tree kernels. The approximate tree kernel (ATK) accelerates computation by
selecting a sparse and discriminative subset of subtrees using a linear program.
The kernel allows for incorporating domain knowledge and controlling the overall
computation time through additional constraints. Experiments on applications of
natural language processing and web spam detection demonstrate the efficiency
of the approximate kernels. We observe run-time improvements of two orders
of magnitude while preserving the discriminative expressiveness and classification
rates of regular convolution kernels.