PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Statistical estimation of rational transducers applied to machine translation
Jesús Andrés, Alfons Juan and Francisco Casacuberta
Applied Artificial Intelligence Volume 22, Number 1-2, pp. 4-22, 2008. ISSN 0883-9514

Abstract

The inference of finite-state transducers from bilingual training data plays an important role in many natural-language tasks and mainly in machine translation. However, there are only a few techniques to infer such models. One of these techniques is the grammatical inference and alignments for transducer inference (GIATI) technique that has proven to be very adequate for speech translation, text-input machine translation, or computer-assisted translation. GIATI is a heuristic technique that requires segmented training data (i.e., the input sentences and the output sentences must be segmented with the restriction that the input segments and the output segments must be monotone aligned). For the purpose of obtaining segmented training data, pure statistical word-alignment models are used. This technique is revisited in this article. The main goal is to formally derive the complete GIATI technique using classical expectation-maximization statistical estimation procedure. This new approach allows us to avoid a hard dependence on heuristic “external” statistical techniques (statistical alignments and n-grams). A first set of experimental results obtained in a machine-translation task are also reported to initially validate this new version of the inference technique of finite-state transducers.

EPrint Type:Article
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Learning/Statistics & Optimisation
Natural Language Processing
Theory & Algorithms
ID Code:4506
Deposited By:Alfons Juan
Deposited On:13 March 2009