Effective Term Weighting for Sentence Retrieval
Saeedeh Momtazi, Matthew Lease and Dietrich Klakow
In: ECDL 2010, 6 Sept - 10 Sept 2010, Glasgow, UK.
A well-known challenge of information retrieval is how to
infer a user's underlying information need when the input query consists
of only a few keywords. Question Answering (QA) systems face an equally
important but opposite challenge: given a verbose question, how can the
system infer the relative importance of terms in order to dierentiate
the core information need from supporting context? We investigate three
simple term-weighting schemes for such estimation within the language
modeling retrieval paradigm . While the three schemes described are
ad hoc, they address a principled estimation problem underlying the
standard word unigram model.We also show these schemes enable better
estimation of a state-of-the-art class model based on term clustering .
Using a TREC QA dataset, we evaluate the three weighting schemes for
both word and class models on the QA subtask of sentence retrieval. Our
inverse sentence frequency weighting scheme achieves over 5% absolute
improvement in mean-average precision for the standard word model and
nearly 2% absolute improvement for the class model.