PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Combining Wikipedia-Based Concept Models for Cross-Language Retrieval
Benjamin Roth and Dietrich Klakow
In: Information Retrieval Facility Conference 2010(2010).

Abstract

As a low-cost ressource that is up-to-date, Wikipedia recently gains attention as a means to provide cross-language bridging for information retrieval. Contradictory to a previous study, we show that standard Latent Dirichlet Allocation (LDA) can extract cross-language information that is valuable for IR by simply normalizing the training data. Furthermore, we show that LDA and Explicit Semantic Analysis (ESA) complement each other, yielding significant improvements when combined. Such a combination can significantly contribute to retrieval based on machine translation, especially when query translations contain errors. The experiments were perfomed on the Multext JOC corpus and a CLEF dataset.

EPrint Type:Conference or Workshop Item (Paper)
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Natural Language Processing
Information Retrieval & Textual Information Access
ID Code:8860
Deposited By:Grzegorz Chrupala
Deposited On:21 February 2012