PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Significance Tests for Bizarre Measures in 2-Class Classification Tasks
Mikaela Keller, Johnny Mariéthoz and Samy Bengio
(2004) Technical Report. IDIAP RR.


Statistical significance tests are often used in machine learning to compare the performance of two learning algorithms or two models. However, in most cases, one of the underlying assumptions behind these tests is that the error measure used to assess the performance of one model/algorithm is computed as the sum of errors obtained on each example of the test set. This is however not the case for several well-known measures such as F1, used in text categorization, or DCF, used in person authentication. We propose here a practical methodology to either adapt the existing tests or develop non-parametric solutions for such bizarre measures. We furthermore assess the quality of these tests on a real-life large dataset.

PDF - Requires Adobe Acrobat Reader or other PDF viewer.
EPrint Type:Monograph (Technical Report)
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Learning/Statistics & Optimisation
Information Retrieval & Textual Information Access
ID Code:205
Deposited By:Mikaela Keller
Deposited On:07 June 2004