The Minimumum-Information Principle for Discriminative Learning
Amir Globerson and Naftali Tishby
Proceedings of Uncertainty in Artificial Intelligence (UAI-2004)
Exponential models of distributions are widely used in machine
learning for classification and modelling. It is well
known that they can be interpreted as maximum entropy models
under empirical expectation constraints. In this work, we argue that
for classification tasks, mutual information
is the correct information theoretic measure to be optimized.
We show how the principle of minimum mutual information
generalizes that of maximum entropy, and provides
a comprehensive framework for building discriminative classifiers.
We introduce an iterative algorithm for finding such classifiers which is a generalization of the Blahut-Arimoto algorithm for
calculating the Rate Distortion function. The algorithm
is also applicable to complex multivariate data,
and can be used to analyse graphical models
with partially observed expectation values.
We discuss generalization bounds for our method, and demonstrate its performance on various classification tasks.