Information Theoretic Methods for Learning Generative Models for Relational Structures
In: SIG-11: Second International Workshop on Stochastic Image Grammars, 12 November 2011, Barcelona, Spain..
This talk focusses on work aimed at developing a principled probabilistic and information theoretic framework for learning generative models of relational structure. The aim is develop methods that can be used to learn models that can capture the variability present in graph-structures used to represent shapes or arrangements of shape-primitives in images. Here nodes represent the parts of shape-primitives representing an object, and the the edges represent the relationships which prevail between the parts. The aim is to learn the relationships from examples. Of course such structures can exhibit variability in the arrangement of parts, and the data used in training can be subject to uncertainty. It hence represents a demanding learning problem, for which there is limited available methodology.
Whereas most of traditional pattern recognition and machine learning is concerned with pattern vectors, the issue of how to capture variability in graph, tree or string representations has received relatively little attention in the literature. The main reason for the lack of progress is the difficulty in developing representations that can capture variations in graph-structure. This variability can be attributed to a) variations in either node or edge attributes, b) variations in node or edge composition and c) variations in edge-connectivity. This trichotomy provides a natural framework for analyzing the state-of-theart in the literature. Most of the work on Bayes nets in the graphical models literature can be viewed as modeling variations in node or edge attributes. Examples also include the work aimed at using Gaussian models to capture variations in edge attributes. The problems of modeling variations in node and edge composition are more challenging since they focus on modeling the structure of the graph rather than its attributes.
The problem of learning edge structure is probably the most challenging of those listed above. Broadly speaking there are two approaches to characterizing variations in edge structure for graphs. The first of these is graph spectral, while the second is probabilistic. In the case of graph spectra, many of the ideas developed in the generative modeling of shape using principal components analysis can be translated relatively directly to graphs using simple vectorization procedures based on the correspondences conveyed by the ordering of Laplacian eigenvectors. Although these methods are simple and effective, they are limited by the stability of the Laplacian spectrum under perturbations in graph structure. The probabilistic approach is potentially more robust, but requires accurate correspondence information to be inferred from the available graph structure. If this is to hand, then a representation of edge structure can be learned. To date the most effective algorithm falling into this category exploits a part-based representation.
In this talk, we focus on the third problem and aim to learn a generative model that can be used to describe the distribution of structural variations present in a set of sample graphs, and in particular to characterize the variations of the edge structure present in the set. We follow recent work by Torsello and Hancock, and pose the problem as that of learning a generative supergraph representation from which we can sample. However, their work is based on trees, and since the trees are rooted the learning process can be effected by performing tree merging operations in polynomial time. This greedy strategy does not translate tractably to graphs where the complexity becomes exponential, and we require different strategies for learning and sampling. Torsello and Hancock realize both using edit operations, here on the other hand we use a soft-assign method for optimization and then generate new instances by Gibbs sampling. Here, we take an information theoretic approach to estimating the supergraph structure by using a minimum description length criterion. By taking into account the overall code-length in the model, MDL allows us to select a supergraph representation that trades-off goodness-of-fit with the observed sample graphs against the complexity of the model. We adopt the probabilistic model to furnish the required learning framework and encode the complexity of the supergraph using its von-Neumann entropy (i.e. the entropy of its Normalized Laplacian eigenvalues). Finally, a variant of EM algorithm is developed to minimize the total code-length criterion, in which the correspondences between the nodes of the sample graphs and those of the supergraph are treated as missing data. In the maximization step, we update both the node correspondence information and the structure of supergraph using graduated assignment. This novel technique is applied to a large database of object views, and used to learn class prototypes that can be used for the purposes of object recognition.