PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Focused Topic Models
Sinead Williamson, Chong Wang, Katherine Heller and David Blei
In: NIPS workshop on Applications for Topic Models: Text and Beyond, 11 Dec 2009, Whistler, Canada.

Abstract

We present the focused topic model (FTM), a family of nonparametric Bayesian models for learning sparse topic mixture patterns. The FTM integrates desirable features from both the hierarchical Dirichlet process (HDP) and the Indian buffet process (IBP) – allowing an unbounded number of topics for the entire corpus, while each document maintains a sparse distribution over these topics. We observe that the HDP assumes correlation between the global and within-documant prevalences of a topic, and note that such a relationship may be undesirable. By using an IBP to select which topics contribute to a document, and an unnormalized Dirichlet Process to determine how much of the document is generated by that topic, the FTM decouples these probabilities, allowing for more flexible modeling. Experimental results on three text corpora demonstrate superior performance over the hierarchical Dirichlet process topic model.

EPrint Type:Conference or Workshop Item (Poster)
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Learning/Statistics & Optimisation
ID Code:5811
Deposited By:Sinead Williamson
Deposited On:08 March 2010