PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Model selection via testing: an alternative to (penalized) maximum likelihood estimators
Lucien Birgé
Annales de l'Institut Henri Poincaré, Probabilité et Statistiques Volume 42, pp. 273-325, 2006. ISSN 0246-0203


This paper is devoted to the definition and study of a family of model selection oriented estimators that we shall call T-estimators (``T" for tests). Their construction is based on former ideas about deriving estimators from some families of tests due to Le Cam ([46] and [47]) and Birg\'e ([8], [9] and [10]) and about complexity based model selection from Barron and Cover [6]. It is well-known that maximum likelihood estimators and, more generally, minimum contrast estimators do suffer from various weaknesses, and their penalized versions as well. In particular they are not robust and they require restrictive assumptions on both the models and the underlying parameter set to work correctly. We propose an alternative construction, which derives an estimator from many simultaneous tests between some probability balls in a suitable metric space. In many cases, although not in all, it results in a penalized M-estimator restricted to a suitable countable set of parameters. On the one hand, this construction should be considered as a theoretical rather than a practical tool because of its high computational complexity. On the other hand, it solves many of the previously mentioned difficulties provided that the tests involved in our construction exist, which is the case for various statistical frameworks including density estimation from i.i.d.\ variables or estimating the mean of a Gaussian sequence with a known variance. For all such frameworks, the robustness properties of our estimators allow to deal with minimax estimation and model selection in a unified way, since bounding the minimax risk amounts to performing our method with a single, well-chosen, model. This results, for those frameworks, in simple bounds for the minimax risk solely based on some metric properties of the parameter space. Moreover the method applies to various statistical frameworks and can handle essentially all types of models, linear or not, parametric and non-parametric, simultaneously. It also provides a simple way of aggregating preliminary estimators. From these viewpoints, it is much more flexible than traditional methods and allows to derive some results that do not presently seem to be accessible to them.

EPrint Type:Article
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Learning/Statistics & Optimisation
ID Code:2877
Deposited By:Lucien Birgé
Deposited On:22 November 2006