|
Un modèle Statistique pour la classification de Documents Structurés AbstractWe present a learning model for categorization of structured documents that takes into account both structural information and textual information. We first define a generative model of structured documents using belief networks. Then we transform the generative model into a discriminative one using the Fisher kernel. Finally, we describe an instance of this model applied to the categorization of HTML documents. The experimental application to a classical corpus shows that the use of structural information outperforms other classical models.
[Edit] |