A data-dependent generalisation error bound for the AUC
Nicolas Usunier, Massih Amini and Patrick Gallinari
In: ROCML ICML 2005 Workshop, 8-12 August 2005, Bonn, Germany.
In this paper, we are interested in the generalisation properties of the Area Under the ROC Curve (AUC). The optimisation of the AUC has recently been proposed for learning ranking functions. However, the estimation of the AUC of a function - depending on the true distribution of examples - using its empirical value - computed on a training set - is still an open problem. In this paper, we present the first data-dependent generalisation error bound for the AUC. This bound presents the advantage to be thight, it also allows to draw practical conclusions on learning algorithms which optimise the AUC. In particular, we show that in the case of AUC, kernel function classes have strong generalisation guarantees provided that the weights of the functions are small, suggesting that regularisation procedures which tend to limit the norm of the weight vector may lead to better generalisation performance for algorithms which optimise the AUC.