Benchmarking Non Parametric Statistical Tests
Mikaela Keller, Samy Bengio and Siew Yeung Wong
In: Advances in Neural Information Processing Systems, NIPS 18(2005).
Although non-parametric tests have already been proposed for that
purpose, statistical significance tests for non-standard measures
(different from the classification error) are less often used in the
literature. This paper is an attempt at empirically verifying how
these tests compare with more classical tests, on various conditions.
More precisely, using a very large dataset to estimate the whole
``population'', we analyzed the behavior of several statistical test,
varying the class unbalance, the compared models, the performance
measure, and the sample size. The main result is that providing big
enough evaluation sets non-parametric tests are relatively reliable in