PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Complexity-based Phrase-table Filtering for Statistical Machine Translation
Nadi Tomeh, Nicola Cancedda and Marc Dymetman
In: Machine Translation Summit XII, 26-30 Aug 2009, Ottawa, Canada.


We describe an approach for filtering phrase tables in a Statistical Machine Translation system, which relies on a statistical independence measure called Noise, first introduced in (Moore, 2004). While previous work by (Johnson et al., 2007) also addressed the question of phrase table filtering, it relied on a simpler independence measure, the p-value, which is theoretically less satisfying than the Noise in this context. In this paper, we use Noise as the filtering criterion, and show that when we partition the bi-phrase tables in several sub-classes according to their complexity, using Noise leads to improvements in BLEU score that are unreachable using pvalue, while allowing a similar amount of pruning of the phrase tables.

PDF - Requires Adobe Acrobat Reader or other PDF viewer.
EPrint Type:Conference or Workshop Item (Paper)
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Learning/Statistics & Optimisation
Natural Language Processing
ID Code:6145
Deposited By:Nicola Cancedda
Deposited On:08 March 2010