PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

A Statistical Model for Unsupervised and Semi-supervised Transliteration Mining
Hassan Sajjad, Alexander Fraser and Helmut Schmid
In: ACL 2012, 8-14 July 2012, Jeju, Republic of Korea.


We propose a novel model to automatically extract transliteration pairs from parallel corpora. Our model is efficient, language pair independent and mines transliteration pairs in a consistent fashion in both unsupervised and semi-supervised settings. We model transliteration mining as an interpolation of transliteration and non-transliteration sub-models. We evaluate on NEWS 2010 shared task data and on parallel corpora with competitive results.

PDF - Requires Adobe Acrobat Reader or other PDF viewer.
EPrint Type:Conference or Workshop Item (Paper)
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Natural Language Processing
ID Code:9542
Deposited By:Alexander Fraser
Deposited On:02 June 2012