PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Apprentissage d'un espace de concepts de mots pour une nouvelle représentation des données textuelles
Young Min Kim, Jean-François Pessiot, Massih Amini and Patrick Gallinari
In: Proceedings of the 5th Conférence en Recherche d'Information et Applications, 12-14 Mar 2008, Trégastel, France.


We present in this paper an unsupervised learning method for dimensionality reduction of text data. This technique is based on the hypothesis that terms co-occuring in the same context with the same frequency are semantically related. On the basis of this hypothesis we first find term clusters using a classifiant version of the EM algorithm. Documents are then represented in the space of these term clusters. We evaluate this method on the task of document clustering and show the effectiveness of our approach on two standard classification collections of WebKB and Reuters.

PDF - Requires Adobe Acrobat Reader or other PDF viewer.
EPrint Type:Conference or Workshop Item (Paper)
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Information Retrieval & Textual Information Access
ID Code:3837
Deposited By:Massih Amini
Deposited On:25 February 2008