PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Bayesian Sets
Zoubin Ghahramani and Katherine Heller
In: NIPS 2005, Vancouver, Canada(2005).

Abstract

Inspired by Google" Sets , we consider the problem of retrieving items from a concept or cluster, given a query consisting of a few items from that cluster. We formulate this as a Bayesian inference problem and describe a very simple algorithm for solving it. Our algorithm uses a modelbased concept of a cluster and ranks items using a score which evaluates the marginal probability that each item belongs to a cluster containing the query items. For exponential family models with conjugate priors this marginal probability is a simple function of sufficient statistics. We focus on sparse binary data and show that our score can be evaluated exactly using a single sparse matrix multiplication, making it possible to apply our algorithm to very large datasets. We evaluate our algorithm on three datasets: retrieving movies from EachMovie, finding completions of author sets from the NIPS dataset, and finding completions of sets of words appearing in the Grolier encyclopedia. We compare to Google" Sets and show that Bayesian Sets gives very reasonable set completions.

PDF - Requires Adobe Acrobat Reader or other PDF viewer.
EPrint Type:Conference or Workshop Item (Paper)
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Learning/Statistics & Optimisation
Theory & Algorithms
Information Retrieval & Textual Information Access
ID Code:1153
Deposited By:Katherine Heller
Deposited On:18 February 2006