PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

NML Computation Algorithms for Tree-Structured Multinomial Bayesian Networks
Petri Kontkanen, Hannes Wettig and Petri Myllymäki
EURASIP Journal on Bioinformatics and Systems Biology 2008. ISSN 1687-4145


Typical problems in bioinformatics involve large discrete datasets. Therefore, in order to apply statistical methods in such domains, it is important to develop efficient algorithms suitable for discrete data. The minimum description length (MDL) principle is a theoretically well-founded, general framework for performing statistical inference. The mathematical formalization of MDL is based on the normalized maximum likelihood (NML) distribution, which has several desirable theoretical properties. In the case of discrete data, straightforward computation of the NML distribution requires exponential time with respect to the sample size, since the definition involves a sum over all the possible data samples of a fixed size. In this paper, we first review some existing algorithms for efficient NML computation in the case of multinomial and Naive Bayes model families. Then we proceed by extending these algorithms to more complex, tree-structured Bayesian networks.

PDF - Requires Adobe Acrobat Reader or other PDF viewer.
EPrint Type:Article
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Computational, Information-Theoretic Learning with Statistics
Theory & Algorithms
ID Code:3545
Deposited By:Petri Kontkanen
Deposited On:11 February 2008