Residual Variance Estimation in Machine Learning
Elia Liitiäinen, Michel Verleysen, Francesco Corona and Amaury Lendasse
The problem of residual variance estimation consists of estimating the best possible
generalization error obtainable by any model based on a finite sample of data.
Even though it is a natural generalization of linear correlation, residual variance
estimation in its general form has attracted relatively little attention in machine
In this paper, we examine four different residual variance estimators and ana-
lyze their properties both theoretically and experimentally to understand better
their applicability in machine learning problems. The theoretical treatment differs
from previous work by being based on a general formulation of the problem cover-
ing also heteroscedastic noise in contrary to previous work, which concentrates on
homoscedastic and additive noise.
In the second part of the paper, we demonstrate practical applications in input
and model structure selection. The experimental results show that using residual
variance estimators in these tasks gives good results often with a reduced compu-
tational complexity, while the nearest neighbor estimators are simple and easy to