|
Apprentissage d'un espace de concepts de mots pour une nouvelle représentation des données textuelles AbstractWe present in this paper an unsupervised learning method for dimensionality reduction of text data. This technique is based on the hypothesis that terms co-occuring in the same context with the same frequency are semantically related. On the basis of this hypothesis we first find term clusters using a classifiant version of the EM algorithm. Documents are then represented in the space of these term clusters. We evaluate this method on the task of document clustering and show the effectiveness of our approach on two standard classification collections of WebKB and Reuters.
[Edit] |