PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Infinite mixtures for multi-relational categorical data
Janne Sinkkonen, Janne Aukia and Samuel Kaski
In: MLG-2008: 6th International Workshop on Mining and Learning with Graphs, 4-5 July 2008, Helsnki, Finland.

Abstract

Large relational datasets are prevalent in many fields. We propose an unsupervised component model for relational data, i.e., for heterogeneous collections of categorical co-occurrences. The co-occurrences can be dyadic or n-adic, and over the same or dif- ferent categorical variables. Graphs are a special case, as collections of dyadic co- occurrences (edges) over a set of vertices. The model is simple, with only one latent variable. This allows wide applicability as long as a global latent component solution is preferred, and the generative process fits the application. Estimation with a collapsed Gibbs sampler is straightforward. We de- mostrate the model with graphs enriched with multinomial vertex properties, or more conceretely, with two sets of scientific papers, with both content and citation information available.

PDF - Requires Adobe Acrobat Reader or other PDF viewer.
EPrint Type:Conference or Workshop Item (Paper)
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Theory & Algorithms
ID Code:5076
Deposited By:Samuel Kaski
Deposited On:24 March 2009