PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Minimum Error Rate Training by Sampling the Translation Lattice
Samidh Chatterjee and Nicola Cancedda
In: Empirical Methods for Natural Language Processing, EMNLP 2010, 9-11 Oct 2010, Cambridge, Massachusetts, US.

Abstract

Minimum Error Rate Training is the algorithm for log-linear model parameter training most used in state-of-the-art Statistical Machine Translation systems. In its original formulation, the algorithm uses N-best lists output by the decoder to grow the Translation Pool that shapes the surface on which the actual optimization is performed. Recent work has been done to extend the algorithm to use the entire translation lattice built by the decoder, instead of N-best lists. We disclose here a third, intermediate way, consisting in growing the translation pool using samples randomly drawn from the translation lattice. We empirically measure an improvement in the BLEU scores compared to training using N-best lists, without suffering the increase in computational complexity associated with operating with the whole lattice.

PDF - Requires Adobe Acrobat Reader or other PDF viewer.
EPrint Type:Conference or Workshop Item (Poster)
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Natural Language Processing
ID Code:7186
Deposited By:Nicola Cancedda
Deposited On:08 March 2011