PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Information-Theoretically Optimal Histogram Density Estimation
Petri Kontkanen and Petri Myllymäki
(2006) Technical Report. Helsinki Institute for Information Technology, Helsinki, Finland.

Abstract

We regard histogram density estimation as a model selection problem. Our approach is based on the information-theoretic minimum description length (MDL) principle. MDL-based model selection is formalized via the normalized maximum likelihood (NML) distribution, which has several desirable optimality properties. We show how this approach can be applied for learning generic, irregular (variable-width bin) histograms, and how to compute the model selection criterion efficiently. We also derive a dynamic programming algorithm for finding both the NML-optimal bin count and the cut point locations in polynomial time. Finally, we demonstrate our approach via simulation tests.

PDF - PASCAL Members only - Requires Adobe Acrobat Reader or other PDF viewer.
EPrint Type:Monograph (Technical Report)
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Computational, Information-Theoretic Learning with Statistics
ID Code:2932
Deposited By:Petri Kontkanen
Deposited On:23 November 2006