PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Enhancement of Lexical Concepts Using Cross-lingualWeb Mining
Dmitry Davidov and Ari Rappoport
In: EMNLP 2009(2009).


Sets of lexical items sharing a significant aspect of their meaning (concepts) are fundamental in linguistics and NLP. Manual concept compilation is labor intensive, error prone and subjective. We present a web-based concept extension algorithm. Given a set of terms specifying a concept in some language, we translate them to a wide range of intermediate languages, disambiguate the translations using web counts, and discover additional concept terms using symmetric patterns. We then translate the discovered terms back into the original language, score them, and extend the original concept by adding backtranslations having high scores. We evaluate our method in 3 source languages and 45 intermediate languages, using both human judgments andWordNet. In all cases, our cross-lingual algorithm significantly improves high quality concept extension.

PDF - Requires Adobe Acrobat Reader or other PDF viewer.
EPrint Type:Conference or Workshop Item (Paper)
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Natural Language Processing
Information Retrieval & Textual Information Access
ID Code:5564
Deposited By:Ari Rappoport
Deposited On:04 March 2010