When Data Compression and Statistics Disagree: Two Frequentist Challenges for the Minimum Description Length Principle
Tim Erven, van
PhD thesis, Leiden University.
According to the minimum description length (MDL) principle, data
compression should be taken as the main goal of statistical inference.
This stands in sharp contrast to making assumptions about an underlying
``true'' distribution generating the data, as is standard in the
traditional frequentist approach to statistics. If the MDL premise of
making data compression a fundamental notion can hold its ground, it
promises a robust kind of statistics, which does not break down when
standard, but hard to verify, assumptions are not completely satisfied.
This makes it worthwhile to put data compression to the test, and see
whether it really makes sense as a foundation for statistics. A natural
starting point are cases where standard MDL methods show suboptimal
performance in a traditional frequentist analysis. This thesis analyses
two such cases.
In the first case it is found that although the standard MDL method
fails, data compression still makes sense and actually leads to the
solution of the problem. In the second case we discuss a modification of
the standard MDL estimator that has been proposed in the literature,
which goes against its data compression principles. We also review the
basic properties of Rényi's dissimilarity measure for probability