PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Learning an Expert from Human Annotations in Statistical MachineTranslation: the Case of Out-of-Vocabulary Words
Wilker Aziz, Marc Dymetman, Shachar Mirkin, Lucia Specia, Nicola Cancedda and Ido Dagan
In: Proceedings EAMT(2010).


We present a general method for incorporating an “expert” model into a Statistical Machine Transla- tion (SMT) system, in order to improve its perfor- mance on a particular “area of expertise”, and ap- ply this method to the specific task of finding ade- quate replacements for Out-of-Vocabulary (OOV) words. Candidate replacements are paraphrases and entailed phrases, obtained using monolin- gual resources. These candidate replacements are transformed into “dynamic biphrases”, generated at decoding time based on the context of each source sentence. Standard SMT features are en- hanced with a number of new features aimed at scoring translations produced by using different replacements. Active learning is used to discrimi- natively train the model parameters from human assessments of the quality of translations. The learning framework yields an SMT system which is able to deal with sentences containing OOV words but also guarantees that the performance is not degraded for input sentences without OOV words. Results of experiments on English-French translation show that this method outperforms pre- vious work addressing OOV words in terms of ac- ceptability.

EPrint Type:Conference or Workshop Item (Paper)
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Learning/Statistics & Optimisation
Natural Language Processing
ID Code:7295
Deposited By:Marc Dymetman
Deposited On:17 March 2011