Safe Learning: bridging the gap between Bayes, MDL and statistical learning theory via empirical convexity
In: COLT 2011, 9-11 July 2011, Budapest, Hungary.
We extend Bayesian MAP and Minimum Description Length (MDL) learning by testing whether the data can be substantially more compressed by a mixture of the MDL/MAP distribution with another element of the model, and adjusting the learning rate if this is the case. While standard Bayes and MDL can fail to converge if the model is wrong, the resulting "safe" estimator continues to achieve good rates with wrong models. Moreover, when applied to classication and regression models as considered in statistical learning theory, the approach achieves optimal rates under, e.g., Tsybakov's conditions, and reveals new situations in which we can penalize by -log prior/n rather than the square root thereof.