PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Apprentissage Statistique pour la Constitution de Corpus d'évaluation
Huyen-Trang Vu and Patrick Gallinari
I3 (Information - Interaction - Intelligence) Volume 7, Number 1, 2007.


Test collections play a crucial role for information retrieval system evaluation. The acquisition of relevance assessment has been recognized as the most expensive part in test collection building, especially on very large size document collections. This paper presents a new method for efficiently selecting documents for the assessment set. The method is based on machine learning algorithms which directly learn to rank documents according to their relevance to the queries. It leads to smaller pools than traditional round robin pooling or alternative proposals. It thus reduces significantly the manual assessment workload. Experimental results on TREC collections consistently demonstrate the effectiveness of our approach according to different evaluation criteria.

EPrint Type:Article
Additional Information:extended version of CORIA'06 paper
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Information Retrieval & Textual Information Access
ID Code:2621
Deposited By:Huyen-Trang Vu
Deposited On:22 November 2006