A simple infinite topic mixture for rich graphs and relational data
Janne Sinkkonen, Juuso Parkkinen, Janne Aukia and Samuel Kaski
In: NIPS 2008 Workshop on Analyzing Graphs: Theory and Applications, 12 Dec 2008, Whistler, BC.
We propose a simple component or "topic" model for relational data, that is, for heterogeneous collections of co-occurrences between categorical variables. Graphs are a special case, as collections of dyadic co-occurrences (edges) over a set of vertices. The model is especially suitable for finding global components from collections of massively heterogeneous data, where encoding all the relations to a more sophisticated model becomes cumbersome, as well as for quick-and- dirty modeling of graphs enriched with, e.g., link properties or nodal attributes. The model is here estimated with collapsed Gibbs sampling, which allows sparse data structures and good memory efficiency for large data sets. Other inference methods should be straightforward to implement. We demonstrate the model with various medium-sized data sets (scientific citation data, MovieLens ratings, protein interactions), with brief comparisons to a full relational model and other approaches.
|EPrint Type:||Conference or Workshop Item (Paper)|
|Project Keyword:||Project Keyword UNSPECIFIED|
|Subjects:||Theory & Algorithms|
|Deposited By:||Samuel Kaski|
|Deposited On:||24 March 2009|