Complexity Based Phrase-Table Filtering for Statistical Machine Translation ## AbstractWe describe an approach for ﬁltering phrase tables in a Statistical Machine Translation sys- tem, which relies on a statistical indepen- dence measure called Noise, ﬁrst introduced in (Moore, 2004). While previous work by (Johnson et al., 2007) also addressed the ques- tion of phrase table ﬁltering, it relied on a simpler independence measure, the p-value, which is theoretically less satisfying than the Noise in this context. In this paper, we use Noise as the ﬁltering criterion, and show that when we partition the bi-phrase tables in sev- eral sub-classes according to their complex- ity, using Noise leads to improvements in BLEU score that are unreachable using p- value, while allowing a similar amount of pruning of the phrase tables.
[Edit] |