Kolmogorov's Structure functions and model selection
Nikolai K. Vereshchagin and Paul M.B. Vitanyi
IEEE Transactions Information Theory Volume To appear, 2003.

## Abstract

In 1974 Kolmogorov proposed a non-probabilistic approach to statistics, an individual combinatorial relation between the data and its model, expressed by the so-called structure function'' of the data. We show that the structure function determines all stochastic properties of the data in the sense of determining the best-fitting model at every model-complexity level. A consequence is this: minimizing the data-to-model code length (finding the ML estimator or MDL estimator), in a class of contemplated models of prescribed maximal (Kolmogorov) complexity, {\em always} results in a model of best fit, irrespective of whether the source producing the data is in the model class considered. In this setting, code minimization {\em always} separates optimal model information from the remaining accidental information, and not only with high probability. The function that maps the maximal allowed model complexity to the goodness-of-fit (expressed as minimal randomness deficiency'') of the best model cannot itself be monotonically approximated. However, the shortest one-part or two-part code above can---implicitly optimizing this elusive goodness-of-fit. We show that---within the obvious constraints---every graph is realized by the structure function of some data. We determine the (un)computability properties of the various functions contemplated and of the algorithmic minimal sufficient statistic.''

 Postscript - Requires a viewer, such as GhostView
EPrint Type: Article Project Keyword UNSPECIFIED Computational, Information-Theoretic Learning with StatisticsLearning/Statistics & Optimisation 126 Paul Vitányi 27 May 2004