Comparing Two Techniques for Learning Transliteration Models Using a Parallel Corpus
Hassan Sajjad, Nadir Durrani, Helmut Schmid and Alexander Fraser
In: The 5th International Joint Conference on Natural Language Processing (IJCNLP2011), 09 Nov - 11 Nov 2011, Chiang Mai, Thailand.
We compare the use of an unsupervised transliteration mining method
and a rule-based method to automatically extract lists of
transliteration word pairs from a parallel corpus of Hindi/Urdu. We
build joint source channel models on the automatically aligned
orthographic transliteration units of the automatically extracted
lists of transliteration pairs resulting in two transliteration
systems. We compare our systems with three transliteration systems
available on the web, and show that our systems have better
performance. We perform an extensive analysis of the results of using
both methods and show evidence that the unsupervised transliteration
mining method is superior for applications requiring high recall
transliteration lists, while the rule-based method is useful for
obtaining high precision lists.