Apprentissage Statistique pour la Constitution de Corpus d'évaluation
Test collections play a crucial role for information retrieval system evaluation. The acquisition of relevance assessment has been recognized as the most expensive part in test collection building, especially on very large size document collections. This paper presents a new method for efficiently selecting documents for the assessment set. The method is based on machine learning algorithms which directly learn to rank documents according to their relevance to the queries. It leads to smaller pools than traditional round robin pooling or alternative proposals. It thus reduces significantly the manual assessment workload. Experimental results on TREC collections consistently demonstrate the effectiveness of our approach according to different evaluation criteria.