PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

A belief network based generative model for structured documents
Ludovic Denoyer and Patrick Gallinari
In: MLDM 2003, Leipzig, Germany(2003).


We present a generative Bayesian model for the modeling of structured (e.g. XML) documents. This model allows us to simultaneously take into account structure and content information. It is used here for classifying XML documents. We adopt a machine learning approach and the model parameters are learned from a labeled training set of representative documents. We discuss the role of structural information for classification and describe experiments on a small collection of class labeled structured documents. We also present preliminary results showing how this model could classify documents with DTDs not represented in the training set.

PDF - Requires Adobe Acrobat Reader or other PDF viewer.
EPrint Type:Conference or Workshop Item (Paper)
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Learning/Statistics & Optimisation
Information Retrieval & Textual Information Access
ID Code:440
Deposited By:Ludovic Denoyer
Deposited On:22 December 2004