PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

A Methodology for Topographic Clustering of Structured Text Documents
Marie-Jeanne Lesot, Delphine Dard and Florence d'Alché-Buc
In: Learning Methods for Text Understanding and Mining, 26 - 29 January 2004, Grenoble, France.


Sets of texts are structured through a more or less refined hierarchy of sections, subsections and paragraphs; this structure contains information that should be exploited to handle these data and in particular, to enrich the comparison of texts, as a complement to the vector description of their contents. We propose a kernel-based methodology that follows this principle for a topographic clustering task and define a hierarchical kernel which compares paragraphs using the available hierarchical decomposition and in particular the provided titles.

PDF - Requires Adobe Acrobat Reader or other PDF viewer.
Postscript - Requires a viewer, such as GhostView
EPrint Type:Conference or Workshop Item (Paper)
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Natural Language Processing
Theory & Algorithms
Information Retrieval & Textual Information Access
ID Code:30
Deposited By:Steve Gunn
Deposited On:09 May 2004