Model selection via testing: an alternative to (penalized) maximum likelihood estimators ## AbstractThis paper is devoted to the definition and study of a family of model selection oriented estimators that we shall call T-estimators (``T" for tests). Their construction is based on former ideas about deriving estimators from some families of tests due to Le Cam ([46] and [47]) and Birg\'e ([8], [9] and [10]) and about complexity based model selection from Barron and Cover [6]. It is well-known that maximum likelihood estimators and, more generally, minimum contrast estimators do suffer from various weaknesses, and their penalized versions as well. In particular they are not robust and they require restrictive assumptions on both the models and the underlying parameter set to work correctly. We propose an alternative construction, which derives an estimator from many simultaneous tests between some probability balls in a suitable metric space. In many cases, although not in all, it results in a penalized M-estimator restricted to a suitable countable set of parameters. On the one hand, this construction should be considered as a theoretical rather than a practical tool because of its high computational complexity. On the other hand, it solves many of the previously mentioned difficulties provided that the tests involved in our construction exist, which is the case for various statistical frameworks including density estimation from i.i.d.\ variables or estimating the mean of a Gaussian sequence with a known variance. For all such frameworks, the robustness properties of our estimators allow to deal with minimax estimation and model selection in a unified way, since bounding the minimax risk amounts to performing our method with a single, well-chosen, model. This results, for those frameworks, in simple bounds for the minimax risk solely based on some metric properties of the parameter space. Moreover the method applies to various statistical frameworks and can handle essentially all types of models, linear or not, parametric and non-parametric, simultaneously. It also provides a simple way of aggregating preliminary estimators. From these viewpoints, it is much more flexible than traditional methods and allows to derive some results that do not presently seem to be accessible to them.
[Edit] |