Constrained domain maximum likelihood estimation for naive Bayes text classification
The naive Bayes assumption in text classification has the advantage of greatly simplifying maximum likelihood estimation of unknown class-conditional word occurrence probabilities. However, these estimates are usually modified by application of a heuristic parameter smoothing technique to avoid (over-fitted) null estimates. In this work, we advocate the reduction of the parameter domain instead of parameter smoothing. This leads to a constrained domain maximum likelihood estimation problem for which we provide an iterative algorithm that solves it optimally.