PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Theme-Topic Mixture Model for Document Representation
Mikaela Keller and Samy Bengio
In: Learning Methods for Text Understanding and Mining, 26 - 29 January 2004, Grenoble, France.


In Automatic Text Processing tasks, documents are usually represented in the bag-of-words space. However, this representation does not take into account the possible relations between words. We propose here a review of a family of document density estimation models for representing documents. Inside this family we derive another possible model: the Theme Topic Mixture Model (TTMM). This model assumes two types of relations among textual data. Topics link words to each other and Themes gather documents with particular distribution over the topics. An experiment reports the performance of the different models in this family over a common task.

PDF - Requires Adobe Acrobat Reader or other PDF viewer.
Postscript - Requires a viewer, such as GhostView
EPrint Type:Conference or Workshop Item (Paper)
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Theory & Algorithms
Information Retrieval & Textual Information Access
ID Code:28
Deposited By:Steve Gunn
Deposited On:09 May 2004