PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

The Minimumum-Information Principle for Discriminative Learning
Amir Globerson and Naftali Tishby
Proceedings of Uncertainty in Artificial Intelligence (UAI-2004) 2004.


Exponential models of distributions are widely used in machine learning for classification and modelling. It is well known that they can be interpreted as maximum entropy models under empirical expectation constraints. In this work, we argue that for classification tasks, mutual information is the correct information theoretic measure to be optimized. We show how the principle of minimum mutual information generalizes that of maximum entropy, and provides a comprehensive framework for building discriminative classifiers. We introduce an iterative algorithm for finding such classifiers which is a generalization of the Blahut-Arimoto algorithm for calculating the Rate Distortion function. The algorithm is also applicable to complex multivariate data, and can be used to analyse graphical models with partially observed expectation values. We discuss generalization bounds for our method, and demonstrate its performance on various classification tasks.

PDF - Requires Adobe Acrobat Reader or other PDF viewer.
EPrint Type:Article
Additional Information:A new information theoretic inference principle for learning and data analysis.
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Computational, Information-Theoretic Learning with Statistics
Learning/Statistics & Optimisation
Theory & Algorithms
ID Code:871
Deposited By:Naftali Tishby
Deposited On:14 January 2006