PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Un modèle Statistique pour la classification de Documents Structurés
Trang Vu, Ludovic Denoyer and Patrick Gallinari
In: EGC 2003, Lyon, France(2003).


We present a learning model for categorization of structured documents that takes into account both structural information and textual information. We first define a generative model of structured documents using belief networks. Then we transform the generative model into a discriminative one using the Fisher kernel. Finally, we describe an instance of this model applied to the categorization of HTML documents. The experimental application to a classical corpus shows that the use of structural information outperforms other classical models.

PDF - Requires Adobe Acrobat Reader or other PDF viewer.
EPrint Type:Conference or Workshop Item (Paper)
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Learning/Statistics & Optimisation
Information Retrieval & Textual Information Access
ID Code:441
Deposited By:Ludovic Denoyer
Deposited On:22 December 2004