|
Findings of the 2011 Workshop on Statistical Machine Translation AbstractThis paper presents the results of the WMT11 shared tasks, which included a translation task, a system combination task, and a task for machine translation evaluation metrics. We conducted a large-scale manual evaluation of 148 machine translation systems and 41 sys- tem combination entries. We used the rank- ing of these systems to measure how strongly automatic metrics correlate with human judg- ments of translation quality for 21 evaluation metrics. This year featured a Haitian Creole to English task translating SMS messages sent to an emergency response service in the af- termath of the Haitian earthquake. We also conducted a pilot ‘tunable metrics’ task to test whether optimizing a fixed system to differ- ent metrics would result in perceptibly differ- ent translation quality.
[Edit] |