PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Fast NML Computation for Naive Bayes Models
Tommi Mononen and Petri Myllymäki
In: 10th International Conference on Discovery Science (DS-2007), 1-4 Oct 2007, Sendai, Japan.

Abstract

The Minimum Description Length (MDL) is an information-theoretic principle that can be used for model selection and other statistical inference tasks. One way to implement this principle in practice is to compute the Normalized Maximum Likelihood (NML) distribution for a given parametric model class. Unfortunately this is a computationally infeasible task for many model classes of practical importance. In this paper we present a fast algorithm for computing the NML for the Naive Bayes model class, which is frequently used in classification and clustering tasks. The algorithm is based on a relationship between powers of generating functions and discrete convolution. The resulting algorithm has the time complexity of O(n^2), where n is the size of the data.

PDF - Requires Adobe Acrobat Reader or other PDF viewer.
EPrint Type:Conference or Workshop Item (Paper)
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Computational, Information-Theoretic Learning with Statistics
Theory & Algorithms
ID Code:3410
Deposited By:Tommi Mononen
Deposited On:10 February 2008