## AbstractMetric entropy and generic chaining methods are powerful tools from probabil- ity theory that can be used to study pathwise properties of stochastic processes. Despite this fact they have largely been ignored in machine learning. We demon- strate their power in this work in applying them to a bandit problem with a Gaussian process prior. The diculty of the setting lies in the fact that we are dealing with a continuous space of arms and we need to control the supremum of a reward process on the arms. We apply the so called Dudley integral to reduce the problem of controlling the supremum of a \dicult" stochastic process to the problem of bounding a canonical metric that is based solely on the covariance function (which is an analytical and thus \simple" object). We consider the sce- nario in which there is no noise in the observed reward. Our main result is to bound the regret experienced by algorithms relative to the a posteriori optimal strategy of playing the best arm throughout based on benign assumptions about the covariance function dening the Gaussian process.
[Edit] |