Can Gaussian Process Regression Be Made Robust Against Model Mismatch?
Lecture notes in computer science
Learning curves for Gaussian process (GP) regression can be strongly
affected by a mismatch between the `student' model and the `teacher'
(true data generation process), exhibiting e.g. multiple overfitting
maxima and logarithmically slow learning. I investigate whether GPs
can be made robust against such effects by adapting student model
hyperparameters to maximize the evidence (data likelihood). An
approximation for the average evidence is derived and used to predict
the optimal hyperparameter values and the resulting generalization
error. For large input space dimension, where the approximation
becomes exact, Bayes-optimal performance is obtained at the evidence
maximum, but the actual hyperparameters (e.g. the noise level) do not
necessarily reflect the properties of the teacher. Also, the
theoretically achievable evidence maximum cannot always be reached
with the chosen set of hyperparameters, and maximizing the evidence in
such cases can actually make generalization performance worse rather
than better. In lower-dimensional learning scenarios,
the theory predicts---in excellent qualitative and good quantitative
accord with simulations---that evidence maximization eliminates logarithmically slow learning and recovers the optimal scaling of the decrease of generalization error with training set size.