PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Comparing Two Techniques for Learning Transliteration Models Using a Parallel Corpus
Hassan Sajjad, Nadir Durrani, Helmut Schmid and Alexander Fraser
In: The 5th International Joint Conference on Natural Language Processing (IJCNLP2011), 09 Nov - 11 Nov 2011, Chiang Mai, Thailand.

Abstract

We compare the use of an unsupervised transliteration mining method and a rule-based method to automatically extract lists of transliteration word pairs from a parallel corpus of Hindi/Urdu. We build joint source channel models on the automatically aligned orthographic transliteration units of the automatically extracted lists of transliteration pairs resulting in two transliteration systems. We compare our systems with three transliteration systems available on the web, and show that our systems have better performance. We perform an extensive analysis of the results of using both methods and show evidence that the unsupervised transliteration mining method is superior for applications requiring high recall transliteration lists, while the rule-based method is useful for obtaining high precision lists.

PDF - Requires Adobe Acrobat Reader or other PDF viewer.
EPrint Type:Conference or Workshop Item (Paper)
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Natural Language Processing
ID Code:9221
Deposited By:Alexander Fraser
Deposited On:21 February 2012