Enhancement of Lexical Concepts Using Cross-lingualWeb Mining
Dmitry Davidov and Ari Rappoport
In: EMNLP 2009(2009).
Sets of lexical items sharing a significant
aspect of their meaning (concepts) are fundamental
in linguistics and NLP. Manual
concept compilation is labor intensive, error
prone and subjective. We present a
web-based concept extension algorithm.
Given a set of terms specifying a concept
in some language, we translate them to
a wide range of intermediate languages,
disambiguate the translations using web
counts, and discover additional concept
terms using symmetric patterns. We then
translate the discovered terms back into
the original language, score them, and extend
the original concept by adding backtranslations
having high scores. We evaluate
our method in 3 source languages and
45 intermediate languages, using both human
judgments andWordNet. In all cases,
our cross-lingual algorithm significantly
improves high quality concept extension.