PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Variational Information-Maximization and Conditional Self-Supervised Training
Felix Agakov
In: Mathematical Foundations of Learning Theory -II, 31 May - 3 Jun 2006, Paris, France.


In this work we investigate a relation between the variational Arimoto-Blahut (Information-Maximizing) algorithm for the channel capacity of encoder models and the variational EM for generative models and stochastic autoencoders. Much of the previous work on relating maximization of the mutual information, likelihood, and conditional likelihood in such models focused primarily on specifically constrained invertible mappings (e.g. Cardoso, 1997; MacKay, 1999) or specific noiseless autoencoders (Oja, 1989), where the computations were exact. Our goal here was to investigate relations between these learning paradigms for more general graphical models, which could arguably be more practical for describing real-world communication channels or data-generating processes. Since in our case the optimized objectives were generally computationally intractable, we considered their common variational relaxations. Our focus on the variational EM and IM was due to popularity and simplicity of these approaches for approximate training of generally intractable graphical models.

PDF - Requires Adobe Acrobat Reader or other PDF viewer.
EPrint Type:Conference or Workshop Item (Poster)
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Computational, Information-Theoretic Learning with Statistics
Learning/Statistics & Optimisation
ID Code:2518
Deposited By:Felix Agakov
Deposited On:22 November 2006