PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Aligning words using matrix factorisation
Cyril Goutte, Kenji Yamada and Eric Gaussier
In: 42nd Annual Meeting of the Association for Computational Linguistics, July 25-26, 2004, Barcelona, Spain.

Abstract

Aligning words from sentences which are mutual translations is an important problem in different settings, such as bilingual terminology extraction, Machine Translation, or projection of linguistic features. Here, we view word alignment as matrix factorisation. In order to produce proper alignments, we show that factors must satisfy a number of constraints such as orthogonality. We then propose an algorithm for orthogonal non-negative matrix factorisation, based on a probabilistic model of the alignment data, and apply it to word alignment. This is illustrated on a French-English alignment task from the Hansard.

PDF - PASCAL Members only - Requires Adobe Acrobat Reader or other PDF viewer.
EPrint Type:Conference or Workshop Item (Paper)
Additional Information:http://www.xrce.xerox.com/Publications/Display-Abstract.php?ReportID=1214
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Learning/Statistics & Optimisation
Natural Language Processing
ID Code:552
Deposited By:Cyril Goutte
Deposited On:25 December 2004