PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Document structure matching for heterogeneous corpora
Ludovic Denoyer, Guillaume Wisniewski and Patrick Gallinari
In: SIGIR 2004 : Workshop on XML and Information Retrieval, 25-29 July 2004, Sheffield, UK.

Abstract

Querying heterogeneous XML document collections is an open problem. This will require building some sort of correspondence between the DTD of the different sources. We consider here the problem of matching the structure of XML documents from different sources. We introduce for that a stochastic structured document model and describe preliminary experiments performed on the INEX collection.

PDF - Requires Adobe Acrobat Reader or other PDF viewer.
EPrint Type:Conference or Workshop Item (Paper)
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Learning/Statistics & Optimisation
Natural Language Processing
Information Retrieval & Textual Information Access
ID Code:185
Deposited By:Ludovic Denoyer
Deposited On:05 June 2004