Generalization error estimation under covariate shift
This is the latest version of this eprint.
In supervised learning, it is almost always assumed that the training and test input points follow the same probability distribution. However, this assumption is violated, e.g., in interpolation, extrapolation, active learning, or classification with imbalanced data. In such situations---known as the covariate shift, cross-validation estimate of the generalization error is biased, which results in poor model selection. In this paper, we propose an alternative estimator of the generalization error which is under the covariate shift exactly unbiased if model includes the learning target function and is asymptotically unbiased in general. We also show that, in addition to the unbiasedness, the proposed generalization error estimator can accurately estimate the difference of the generalization error among different models, which is a desirable property in model selection. Numerical studies show that the proposed method compares favorably with cross-validation.
Available Versions of this Item