Linguistically Enriched Word-Sequence Kernels for Discriminative Language Modeling
This chapter introduces a method for taking advantage of background linguistic resources in statistical machine translation. Morphological, syntactic and possibly semantic properties of words are combined by means of an enriched word-sequence kernel. In contrast to alternative formulations, linguistic resources are integrated in such a way as to generate rich composite features defined across the various word representations. Word-sequence kernels find natural applications in the context of discriminative language modeling, where they can help correct specific problems of the translation process. As a first step in this direction, experiments on an artificial problem consisting in the detection of word misordering demonstrate the interest of the proposed kernel construction.