PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Complexity Based Phrase-Table Filtering for Statistical Machine Translation
Nadi Tomeh, Nicola Cancedda and Marc Dymetman
MT Summit 2009.

Abstract

We describe an approach for filtering phrase tables in a Statistical Machine Translation sys- tem, which relies on a statistical indepen- dence measure called Noise, first introduced in (Moore, 2004). While previous work by (Johnson et al., 2007) also addressed the ques- tion of phrase table filtering, it relied on a simpler independence measure, the p-value, which is theoretically less satisfying than the Noise in this context. In this paper, we use Noise as the filtering criterion, and show that when we partition the bi-phrase tables in sev- eral sub-classes according to their complex- ity, using Noise leads to improvements in BLEU score that are unreachable using p- value, while allowing a similar amount of pruning of the phrase tables.

EPrint Type:Article
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Learning/Statistics & Optimisation
Natural Language Processing
ID Code:5868
Deposited By:Marc Dymetman
Deposited On:08 March 2010