PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

A Comparative Study on Word Co-occurrence for Term Clustering in Language Model-based Sentence Retrieval
Saeedeh Momtazi, Sanjeev Khudanpur and Dietrich Klakow
In: NAACL 2010, 1 June - 6 June 2010, Los Angeles, CA, USA.


Sentence retrieval is a very important part of question answering systems. Term clustering, in turn, is an effective approach for improving sentence retrieval performance: the more similar the terms in each cluster, the better the performance of the retrieval system. A key step in obtaining appropriate word clusters is accurate estimation of pairwise word similarities, based on their tendency to co-occur in similar contexts. In this paper, we compare four different methods for estimating word co-occurrence frequencies from two different corpora. The results show that different, commonly-used contexts for defining word co-occurrence differ significantly in retrieval performance. Using an appropriate co-occurrence criterion and corpus is shown to improve the mean average precision of sentence retrieval form 36.8% to 42.1%.

PDF - Requires Adobe Acrobat Reader or other PDF viewer.
EPrint Type:Conference or Workshop Item (Paper)
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Natural Language Processing
Information Retrieval & Textual Information Access
ID Code:8867
Deposited By:Diana Schreyer
Deposited On:21 February 2012