PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Constrained domain maximum likelihood estimation for naive Bayes text classification
Jesús Andrés and Alfons Juan
Pattern Analysis & Applications 2008. ISSN 1433-7541 (Print) 1433-755X (Online)


The naive Bayes assumption in text classification has the advantage of greatly simplifying maximum likelihood estimation of unknown class-conditional word occurrence probabilities. However, these estimates are usually modified by application of a heuristic parameter smoothing technique to avoid (over-fitted) null estimates. In this work, we advocate the reduction of the parameter domain instead of parameter smoothing. This leads to a constrained domain maximum likelihood estimation problem for which we provide an iterative algorithm that solves it optimally.

EPrint Type:Article
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Learning/Statistics & Optimisation
Natural Language Processing
Theory & Algorithms
ID Code:4560
Deposited By:Alfons Juan
Deposited On:24 March 2009