Variational Information-Maximization and Conditional Self-Supervised Training
In: Mathematical Foundations of Learning Theory -II, 31 May - 3 Jun 2006, Paris, France.
In this work we investigate a relation between the variational Arimoto-Blahut (Information-Maximizing) algorithm for the channel capacity of encoder models and the variational EM for generative models and stochastic autoencoders. Much of the previous work on relating maximization of the mutual information, likelihood, and conditional likelihood in such models focused primarily on specifically constrained invertible mappings (e.g. Cardoso, 1997; MacKay, 1999) or specific noiseless autoencoders (Oja, 1989), where the computations were exact. Our goal here was to investigate relations between these learning paradigms for more general graphical models, which could arguably be more practical for describing real-world communication channels or data-generating processes. Since in our case the optimized objectives were generally computationally intractable, we considered their common variational relaxations. Our focus on the variational EM and IM was due to popularity and simplicity of these approaches for approximate training of generally intractable graphical models.