A Statistical Model for Unsupervised and Semi-supervised Transliteration Mining
Hassan Sajjad, Alexander Fraser and Helmut Schmid
In: ACL 2012, 8-14 July 2012, Jeju, Republic of Korea.
We propose a novel model to automatically extract transliteration
pairs from parallel corpora. Our model is efficient, language pair
independent and mines transliteration pairs in a consistent fashion in
both unsupervised and semi-supervised settings. We model
transliteration mining as an interpolation of transliteration and
non-transliteration sub-models. We evaluate on NEWS 2010 shared task
data and on parallel corpora with competitive results.