|
Information-Theoretically Optimal Histogram Density Estimation
AbstractWe regard histogram density estimation as a model selection problem. Our approach is based on the information-theoretic minimum description length (MDL) principle. MDL-based model selection is formalized via the normalized maximum likelihood (NML) distribution, which has several desirable optimality properties. We show how this approach can be applied for learning generic, irregular (variable-width bin) histograms, and how to compute the model selection criterion efficiently. We also derive a dynamic programming algorithm for finding both the NML-optimal bin count and the cut point locations in polynomial time. Finally, we demonstrate our approach via simulation tests.
[Edit] |