PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

When Data Compression and Statistics Disagree: Two Frequentist Challenges for the Minimum Description Length Principle
Tim Erven, van
(2010) PhD thesis, Leiden University.

Abstract

According to the minimum description length (MDL) principle, data compression should be taken as the main goal of statistical inference. This stands in sharp contrast to making assumptions about an underlying ``true'' distribution generating the data, as is standard in the traditional frequentist approach to statistics. If the MDL premise of making data compression a fundamental notion can hold its ground, it promises a robust kind of statistics, which does not break down when standard, but hard to verify, assumptions are not completely satisfied. This makes it worthwhile to put data compression to the test, and see whether it really makes sense as a foundation for statistics. A natural starting point are cases where standard MDL methods show suboptimal performance in a traditional frequentist analysis. This thesis analyses two such cases. In the first case it is found that although the standard MDL method fails, data compression still makes sense and actually leads to the solution of the problem. In the second case we discuss a modification of the standard MDL estimator that has been proposed in the literature, which goes against its data compression principles. We also review the basic properties of Rényi's dissimilarity measure for probability distributions.

PDF - Requires Adobe Acrobat Reader or other PDF viewer.
EPrint Type:Thesis (PhD)
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Computational, Information-Theoretic Learning with Statistics
ID Code:7241
Deposited By:Tim Erven, van
Deposited On:14 March 2011