Classification automatique de structures arborescentes à l’aide du noyau de Fisher : Application aux documents XML
The widespread use of XML has urged the need to develop tools to efficiently store, access and organize XML corpus. The INEX initiative has resulted in major improvements in XML retrieval systems, but today, related tasks, like categorization or structure matching, should be investigated. We consider here the problem of clustering XML documents using their structure. In this paper, we propose a Belief networks-based stochastic model which is able to describe different kind of relation between structural elements. We show how these models can be used for the clustering task using the Fisher kernel method. We test them using the INEX corpus.