Replica theory for learning curves for Gaussian processes on random graphs ## AbstractStatistical physics approaches can be used to derive accurate predictions for the performance of inference methods learning from potentially noisy data, as quantified by the learning curve defined as the average error versus number of training examples. We analyse a challenging problem in the area of non-parametric inference where an effectively infinite number of parameters has to be learned, specifically Gaussian process regression. When the inputs are vertices on a random graph and the outputs noisy function values, we show that replica techniques can be used to obtain exact performance predictions in the limit of large graphs. The covariance of the Gaussian process prior is defined by a random walk kernel, the discrete analogue of squared exponential kernels on continuous spaces. Conventionally this kernel is normalised only globally, so that the prior variance can differ between vertices; as a more principled alternative we consider local normalisation, where the prior variance is uniform. The starting point is to represent the average error as the derivative of an appropriate partition function. We rewrite this in terms of a graphical model, where only neighbour vertices are directly coupled. Treating the average over training examples and random graphs in a replica approach then yields learning curve predictions for the globally normalised kernel. The results apply generically to all random graph ensembles constrained by a fixed but arbitrary degree distribution. For the locally normalised kernel, the normalisation constants for the prior have to be defined as thermal averages in an unnormalised model. This case is therefore technically more difficult and requires the introduction of a second, auxiliary set of replicas. We compare our predictions with numerically simulated learning curves for two paradigmatic graph ensembles: Erd\H{o}s-R\'{e}nyi graphs with a Poisson degree distribution, and an ensemble with a power law degree distribution. We find excellent agreement between our predictions and numerical simulations. We also compare our predictions to existing learning curve approximations. These show significant deviations; we analyse briefly where these arise.
[Edit] |