Estimating the Sentence-Level Quality of Machine Translation Systems
We investigate the problem of predicting the quality of sentences produced by ma- chine translation systems when reference translations are not available. The prob- lem is addressed as a regression task and a method that takes into account the con- tribution of different features is proposed. We experiment with this method for trans- lations produced by various MT systems and different language pairs, annotated with quality scores both automatically and manually. Results show that our method allows obtaining good estimates and that identifying a reduced set of relevant fea- tures plays an important role. The experi- ments also highlight a number of outstand- ing features that were consistently selected as the most relevant and could be used in different ways to improve MT perfor- mance or to enhance MT evaluation.