Working Notes for the InFile Campaign : Online Document Filtering Using 1 Nearest Neighbor
Vincent Bodinier, Ali Mustafa Qamar and Eric Gaussier
In: CLEF 2008 Workshop, 17-19 September, Aarhus, Denmark.
This paper has been written as a part of the InFile (INFormation, FILtering, Evaluation) campaign. This project is a cross-language adaptive filtering evaluation campaign, sponsored by the French national research agency, and it is a pilot track of the CLEF (Cross Language Evaluation Forum) 2008 campaigns. We propose in this paper an online algorithm to learn category specific thresholds in a multiclass environment where a document can belong to more than one class. Our method uses 1 Nearest Neighbor (1NN) algorithm for classification. It uses simulated user feedback to fine tune the threshold and in turn the classification performance over time. The experiments were run on English language corpus containing 100,000 documents. The best results have a precision of 0.366 and the recall is 0.260.