PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Word Features for Latent Dirichlet Allocation
James Petterson, A.J. Smola, Tiberio Caetano, Wray Buntine and S Narayanamurthy
In: Advances in Neural Information Processing Systems (NIPS)(2011).

Abstract

We extend Latent Dirichlet Allocation (LDA) by explicitly allowing for the encoding of side information in the distribution over words. This results in a variety of new capabilities, such as improved estimates for infrequently occurring words, as well as the ability to leverage thesauri and dictionaries in order to boost topic cohesion within and across languages. We present experiments on multi-language topic synchronisation where dictionary information is used to bias corresponding words towards similar topics. Results indicate that our model substantially improves topic cohesion when compared to the standard LDA model.

PDF - Requires Adobe Acrobat Reader or other PDF viewer.
EPrint Type:Conference or Workshop Item (Paper)
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Computational, Information-Theoretic Learning with Statistics
Natural Language Processing
Information Retrieval & Textual Information Access
ID Code:7335
Deposited By:Wray Buntine
Deposited On:17 March 2011