Apprentissage Statistique pour la Constitution de Corpus d'évaluation
Test collections play a crucial role in Information Retrieval system evaluation. Forming relevance assessment set has been recognized as the key bottleneck in test collection building, especially on very large sized document collections. This paper addresses the problem of efficiently selecting documents to be included in the assessment set. Machine learning algorithms such as RankBoost can be helpful for this purpose. This leads to smaller pools than traditional round robin pooling, thus reduces significantly the manual assessment workload. Experimental results on TREC collections consistently demonstrate the effectiveness of our approach according to different evaluation criteria.