Statistical estimation of rational transducers applied to machine translation
The inference of finite-state transducers from bilingual training data plays an important role in many natural-language tasks and mainly in machine translation. However, there are only a few techniques to infer such models. One of these techniques is the grammatical inference and alignments for transducer inference (GIATI) technique that has proven to be very adequate for speech translation, text-input machine translation, or computer-assisted translation. GIATI is a heuristic technique that requires segmented training data (i.e., the input sentences and the output sentences must be segmented with the restriction that the input segments and the output segments must be monotone aligned). For the purpose of obtaining segmented training data, pure statistical word-alignment models are used. This technique is revisited in this article. The main goal is to formally derive the complete GIATI technique using classical expectation-maximization statistical estimation procedure. This new approach allows us to avoid a hard dependence on heuristic “external” statistical techniques (statistical alignments and n-grams). A first set of experimental results obtained in a machine-translation task are also reported to initially validate this new version of the inference technique of finite-state transducers.