PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Using Belief Networks and Fisher Kernels for Structured Document Classification
Ludovic Denoyer and Patrick Gallinari
In: OKDD 2003, Cavtat, Croatia(2003).


We consider the classification of structured (e.g. XML) textual documents. We first propose a generative model based on Belief Networks which allows us to simultaneously take into account structure and content information. We then show how this model can be extended into a more efficient classifier using the Fisher kernel method. In both cases model parameters are learned from a labelled training set of representative documents. We present experiments on two collections of structured documents : WebKB which has become a reference corpus for HTML page classification and the new INEX corpus which has been developed for the evaluation of XML information retrieval systems.

PDF - Requires Adobe Acrobat Reader or other PDF viewer.
EPrint Type:Conference or Workshop Item (Paper)
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Learning/Statistics & Optimisation
Information Retrieval & Textual Information Access
ID Code:439
Deposited By:Ludovic Denoyer
Deposited On:22 December 2004