## AbstractWe consider the learning task consisting in predicting as well as the best function in a finite reference set G up to the smallest possible additive term. If R(g) denotes the generalization error of a prediction function g, under reasonable assumptions on the loss function (typically satisfied by the least square loss when the output is bounded), it is known that the progressive mixture rule g_n satisfies E R(g_n) < min_{g in G} R(g) + C (log|G|)/n where n denotes the size of the training set, E denotes the expectation w.r.t. the training set distribution and C denotes a positive constant. On the one hand, we will see that for any training set size n, there exist a>0, a reference set G and a probability distribution generating the data such that with probability at least a R(g_n) > min_{g in G} R(g) + c sqrt{[log(|G|/a)]/n}, where c is a positive constant. In other words, surprisingly, for appropriate reference set G, the deviation convergence rate of the progressive mixture rule is only of order 1/sqrt{n} while its expectation convergence rate is of order 1/n. On the other hand, we will present an algorithm which have both deviation and expectation convergence rate of order 1/n.
[Edit] |