No Unbiased Estimator of the Variance of K-Fold Cross-Validation
Yoshua Bengio and Yves Grandvalet
In: NIPS 2003, 09-11 Dec 2003, Vancouver, Canada, Vancouver, Canada.
Most machine learning researchers perform quantitative experiments to estimate generalization error and compare algorithm performances. In order to draw statistically convincing conclusions, it is important to estimate the uncertainty of such estimates. This paper studies the estimation of uncertainty around the K-fold cross-validation estimator. The main theorem shows that there exists no universal unbiased estimator of the variance of K-fold cross-validation. An analysis based on the eigendecomposition of the covariance matrix of errors helps to better understand the nature of the problem and shows that naive estimators may grossly underestimate variance, as confirmed by numerical experiments.