Information-Theoretically Optimal Histogram Density Estimation
Petri Kontkanen and Petri Myllymäki
Helsinki Institute for Information Technology, Helsinki, Finland.
We regard histogram density estimation as a model selection problem.
Our approach is based on the information-theoretic minimum description
length (MDL) principle. MDL-based model selection is formalized via
the normalized maximum likelihood (NML) distribution, which has
several desirable optimality properties. We show how this approach
can be applied for learning generic, irregular (variable-width bin)
histograms, and how to compute the model selection criterion
efficiently. We also derive a dynamic programming algorithm for
finding both the NML-optimal bin count and the cut point locations in
polynomial time. Finally, we demonstrate our approach via simulation