PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Constrained domain maximum likelihood estimation for naive Bayes text classification
Jesús Andrés-Ferrer and Alfons Juan
Pattern Analysis and Applications Volume 13, Number 2, pp. 189-196, 2010.

Abstract

The naive Bayes assumption in text classification has the advantage of greatly simplifying maximum likelihood estimation of unknown class-conditional word occurrence probabilities. However, these estimates are usually modified by application of a heuristic parameter smoothing technique to avoid (over-fitted) null estimates. In this work, we advocate the reduction of the parameter domain instead of parameter smoothing. This leads to a constrained maximum likelihood estimation problem for which we provide an iterative algorithm that solves it optimally.

EPrint Type:Article
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Computational, Information-Theoretic Learning with Statistics
Natural Language Processing
ID Code:7428
Deposited By:Alfons Juan
Deposited On:17 March 2011