PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Aggregation to compete the best prediction function in a finite set
Jean-Yves Audibert
In: Probability and Statistics in Science and Technology, ISI 2007, Sep 2007, Porto, Portugal.


We consider the learning task consisting in predicting as well as the best function in a finite reference set G up to the smallest possible additive term. If R(g) denotes the generalization error of a prediction function g, under reasonable assumptions on the loss function (typically satisfied by the least square loss when the output is bounded), it is known that the progressive mixture rule g_n satisfies E R(g_n) < min_{g in G} R(g) + C (log|G|)/n where n denotes the size of the training set, E denotes the expectation w.r.t. the training set distribution and C denotes a positive constant. On the one hand, we will see that for any training set size n, there exist a>0, a reference set G and a probability distribution generating the data such that with probability at least a R(g_n) > min_{g in G} R(g) + c sqrt{[log(|G|/a)]/n}, where c is a positive constant. In other words, surprisingly, for appropriate reference set G, the deviation convergence rate of the progressive mixture rule is only of order 1/sqrt{n} while its expectation convergence rate is of order 1/n. On the other hand, we will present an algorithm which have both deviation and expectation convergence rate of order 1/n.

EPrint Type:Conference or Workshop Item (Invited Talk)
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Computational, Information-Theoretic Learning with Statistics
Learning/Statistics & Optimisation
Theory & Algorithms
ID Code:3159
Deposited By:Jean-Yves Audibert
Deposited On:30 December 2007