Bilingual Lexicon Generation Using Non-Aligned Signatures
Daphna Shezaf and Ari Rappoport
Bilingual lexicons are fundamental resources.
Modern automated lexicon generation
methods usually require parallel
corpora, which are not available for most
language pairs. Lexicons can be generated
using non-parallel corpora or a pivot
language, but such lexicons are noisy.
We present an algorithm for generating
a high quality lexicon from a noisy one,
which only requires an independent corpus
for each language. Our algorithm introduces
non-aligned signatures (NAS), a
cross-lingual word context similarity score
that avoids the over-constrained and inefficient
nature of alignment-based methods.
We use NAS to eliminate incorrect translations
from the generated lexicon. We evaluate
our method by improving the quality
of noisy Spanish-Hebrew lexicons generated
from two pivot English lexicons. Our
algorithm substantially outperforms other
lexicon generation methods.