PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Classification automatique de structures arborescentes à l’aide du noyau de Fisher : Application aux documents XML
Ludovic Denoyer and Patrick Gallinari
In: Congres Scineces des Systemes, FRANCE(2005).


The widespread use of XML has urged the need to develop tools to efficiently store, access and organize XML corpus. The INEX initiative has resulted in major improvements in XML retrieval systems, but today, related tasks, like categorization or structure matching, should be investigated. We consider here the problem of clustering XML documents using their structure. In this paper, we propose a Belief networks-based stochastic model which is able to describe different kind of relation between structural elements. We show how these models can be used for the clustering task using the Fisher kernel method. We test them using the INEX corpus.

EPrint Type:Conference or Workshop Item (Paper)
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Learning/Statistics & Optimisation
Natural Language Processing
Information Retrieval & Textual Information Access
ID Code:1436
Deposited By:Ludovic Denoyer
Deposited On:28 November 2005