Document structure matching for heterogeneous corpora
Ludovic Denoyer, Guillaume Wisniewski and Patrick Gallinari
In: SIGIR 2004 : Workshop on XML and Information Retrieval, 25-29 July 2004, Sheffield, UK.
Querying heterogeneous XML document collections is an open problem. This will require building some sort of correspondence between the DTD of the different sources. We consider here the problem of matching the structure of XML documents from different sources. We introduce for that a stochastic structured document model and describe preliminary experiments performed on the INEX collection.