|
The Minimumum-Information Principle for Discriminative Learning AbstractExponential models of distributions are widely used in machine learning for classification and modelling. It is well known that they can be interpreted as maximum entropy models under empirical expectation constraints. In this work, we argue that for classification tasks, mutual information is the correct information theoretic measure to be optimized. We show how the principle of minimum mutual information generalizes that of maximum entropy, and provides a comprehensive framework for building discriminative classifiers. We introduce an iterative algorithm for finding such classifiers which is a generalization of the Blahut-Arimoto algorithm for calculating the Rate Distortion function. The algorithm is also applicable to complex multivariate data, and can be used to analyse graphical models with partially observed expectation values. We discuss generalization bounds for our method, and demonstrate its performance on various classification tasks.
[Edit] |