Theme-Topic Mixture Model for Document Representation
Mikaela Keller and Samy Bengio
In: Learning Methods for Text Understanding and Mining, 26 - 29 January 2004, Grenoble, France.
In Automatic Text Processing tasks, documents are usually
represented in the bag-of-words space. However, this representation
does not take into account the possible relations between words. We
propose here a review of a family of document density estimation
models for representing documents. Inside this family we derive
another possible model: the Theme Topic Mixture Model (TTMM). This
model assumes two types of relations among textual data. Topics
link words to each other and Themes gather documents with particular
distribution over the topics. An experiment reports the performance
of the different models in this family over a common task.