Algorithmic statistics and Kolmogorov's structure function
P.M.B. Vitanyi
In: Advances in Minimum Description Length (2005) MIT Press , Cambridge, Mass. , USA , pp. 151-174. ISBN 0-262-07262-9

## Abstract

Naively speaking, Statistics deals with gathering data, ordering and representing data, and using the data to determine the process that causes the data. That this viewpoint is a little too simplistic is immediately clear: suppose that the true cause of a sequence of outcomes of coin flips is a fair'' coin, where both sides come up with equal probability. It is possible that the sequence consists of heads'' only. Suppose that our statistical inference method succeeds in identifying the true cause (fair coin flips)\index{statistical inference}\index{inference} from these data. Such a method is clearly at fault: from an all-heads sequence a good inference should conclude that the cause is a coin with a heavy bias toward heads'', irrespective of what the true cause is. That is, a good inference method must assume that the data is typical'' for the cause---that is, we don't aim at finding the true'' cause, but we aim at finding a cause for which the data is as typical'' as possible. Such a cause is called a {\em model} for the data. But what if the data consists of a sequence of precise alternations head--tail''? This is as unlikely an outcome for a fair coin flip as the all-heads sequence. Yet, within the coin-type models we have no alternative than to choose a fair coin. But we know very well that the true cause must be different. For some data it may not even make sense to ask for a true cause''. This suggests that truth is not our goal; but within given constraints on the model class we try to find the model for which the data is most typical'' in an appropriate sense, the model that best fits'' the data. Considering the available model class as a a magnifying glass, finding the best fitting model for the data corresponds to finding the position of the magnifying glass that best brings the object into focus. In the coin-flipping example we presented it is possible that the data have no sharply focused model, but within the allowed resolution---here ignoring the order of the outcomes but only counting the number of heads'' in the total---we find the best model. Classically, the setting of statistical inference is as follows: We carryby an unknown probability distribution $P$. Suppose we obtain as outcome the data sample $x$. Given $x$, we want to recover the distribution $P$. For certain reasons we can choose a distribution from a set of acceptable distributions only (which may or may not contain $P$). Intuitively, our selection criteria are that (i) $x$ should be a typical'' outcome of the distribution selected, and (ii) the selected distribution has a simple'' description. We need to make the meaning of typical'' and simple'' rigorous and balance the requirements (i) and (ii). In probabilistic statistics one analyzes the average-case performance of the selection process.For traditional problems, dealing with frequencies over small sample spaces, this approach is appropriate. But for current novel applications, average relations are often irrelevant, since the part of the support of the probability density function that will ever be observed has about zero measure. This is the case in, for example, complex video and sound analysis. There arises the problem that for individual cases the selection performance may be bad although the performance is good on average. or vice versa. There is also the problem of what probability means, whether it is subjective, objective, or exists at all. Kolmogorov's proposal outlined strives for the firmer and less contentious ground expressed in finite combinatorics and effective computation. out a probabilistic experiment of which the outcomes are governed