On the Multinomial Stochastic Complexity and its Connection to the Birthday Problem
Tommi Mononen and Petri Myllymäki
In: 2008 International Conference on Information Theory and Statistical Learning, 14-17 Jul 2008, Las Vegas, Nevada, USA.
The Minimum Description Length (MDL) is an information-theoretic principle that can be used for model selection and other statistical inference tasks. A central concept in this framework is stochastic complexity, defined nowadays for a given parametric model class via the Normalized Maximum Likelihood (NML) distribution. In this paper we focus on the parametric model class of a single multinomial variable, as this case forms a very important building block for more complex models. We show that the computationally demanding normalization term of the multinomial NML can be written in a simple and effective form by using tools of umbral calculus. The time complexity of computing the exact form is O(n), where n is the number of data points. We also give two different descriptions for the normalization term using sets of confluent hypergeometric functions, show an interesting connection between the birthday problem and our problem, and demonstrate how the results can be exploited in practice.