NML Computation Algorithms for Tree-Structured Multinomial Bayesian Networks
Petri Kontkanen, Hannes Wettig and Petri Myllymäki
EURASIP Journal on Bioinformatics and Systems Biology
Typical problems in bioinformatics involve large discrete datasets.
Therefore, in order to apply statistical methods in such domains, it
is important to develop efficient algorithms suitable for discrete
data. The minimum description length (MDL) principle is a
theoretically well-founded, general framework for performing
statistical inference. The mathematical formalization of MDL is based
on the normalized maximum likelihood (NML) distribution, which has
several desirable theoretical properties. In the case of discrete
data, straightforward computation of the NML distribution requires
exponential time with respect to the sample size, since the definition
involves a sum over all the possible data samples of a fixed size. In
this paper, we first review some existing algorithms for efficient NML
computation in the case of multinomial and Naive Bayes model families.
Then we proceed by extending these algorithms to more complex,
tree-structured Bayesian networks.