PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Nonparametric mixed membership models using the IBP compound Dirichlet process
Sinead Williamson, Chong Wang, Katherine Heller and David M. Blei
In: Mixture Estimation and Applications (2011) Wiley , pp. 145-160.

Abstract

Often the assumptions of mixture modelling, namely that each data point belongs to one of a finite or countable number of distributions, are overly restrictive. In many real-life datasets, individual data points exhibit features associated with multiple clusters: a movie may contain elements of both romance and comedy or an individual member of a population may exhibit traits from multiple subpopulations. Mixed membership models are a hierarchical variant of mixture models used for modelling grouped data, where each individual data point consists of a collection of observations. Rather than being assigned to a single component, each data point is associated with a distribution over components, allowing us to capture more complicated relationships between data points than is possible with a simple mixture model. One example of a dataset where a mixed membership assumption is appropriate is a corpus of text documents: each document is a data point and consists of a collection of words. In such an application, each component of the mixture model is a distribution over words, each document is associated with a distribution over these components and each word is associated with a single component. This framework is often referred to as ‘topic modelling’, since we typically find that the posterior components (called ‘topics’) reflect the semantic themes of the documents.

EPrint Type:Book Section
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Theory & Algorithms
ID Code:7760
Deposited By:Sinead Williamson
Deposited On:17 March 2011