PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Effective Term Weighting for Sentence Retrieval
Saeedeh Momtazi, Matthew Lease and Dietrich Klakow
In: ECDL 2010, 6 Sept - 10 Sept 2010, Glasgow, UK.


A well-known challenge of information retrieval is how to infer a user's underlying information need when the input query consists of only a few keywords. Question Answering (QA) systems face an equally important but opposite challenge: given a verbose question, how can the system infer the relative importance of terms in order to dierentiate the core information need from supporting context? We investigate three simple term-weighting schemes for such estimation within the language modeling retrieval paradigm [6]. While the three schemes described are ad hoc, they address a principled estimation problem underlying the standard word unigram model.We also show these schemes enable better estimation of a state-of-the-art class model based on term clustering [5]. Using a TREC QA dataset, we evaluate the three weighting schemes for both word and class models on the QA subtask of sentence retrieval. Our inverse sentence frequency weighting scheme achieves over 5% absolute improvement in mean-average precision for the standard word model and nearly 2% absolute improvement for the class model.

PDF - Requires Adobe Acrobat Reader or other PDF viewer.
EPrint Type:Conference or Workshop Item (Paper)
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Natural Language Processing
Information Retrieval & Textual Information Access
ID Code:8868
Deposited By:Diana Schreyer
Deposited On:21 February 2012