|
Infinite mixtures for multi-relational categorical data AbstractLarge relational datasets are prevalent in many fields. We propose an unsupervised component model for relational data, i.e., for heterogeneous collections of categorical co-occurrences. The co-occurrences can be dyadic or n-adic, and over the same or dif- ferent categorical variables. Graphs are a special case, as collections of dyadic co- occurrences (edges) over a set of vertices. The model is simple, with only one latent variable. This allows wide applicability as long as a global latent component solution is preferred, and the generative process fits the application. Estimation with a collapsed Gibbs sampler is straightforward. We de- mostrate the model with graphs enriched with multinomial vertex properties, or more conceretely, with two sets of scientific papers, with both content and citation information available.
[Edit] |