PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

FUZZY CLUSTERING OF DOCUMENTS
Matjaz Jursic and Nada Lavrac
In: SiKDD 2008, 17 Oct 2008, Ljubljana, Slovenia.

Abstract

This paper presents a short overview of methods for fuzzy clustering and states desired properties for an optimal fuzzy document clustering algorithm. Based on these criteria we chose one of the fuzzy clustering most prominent methods – the c-means, more precisely probabilistic c-means. This algorithm is presented in more detail along with some empirical results of the clustering of 2-dimensional points and documents. For the needs of documents clustering we implemented fuzzy c-means in the TextGarden environment. We show few difficulties with the implementation and their possible solutions. As a conclusion we also propose further work that would be needed in order to fully exploit the power of fuzzy document clustering in TextGarden.

EPrint Type:Conference or Workshop Item (Paper)
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Theory & Algorithms
Information Retrieval & Textual Information Access
ID Code:4972
Deposited By:Jan Rupnik
Deposited On:24 March 2009