PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Structured Multimedia Document Classification
Ludovic Denoyer, Patrick Gallinari, Jean-Noel Vittaut, Sylvie Brunesseaux and Stephan Brunesseaux
In: ACM DOCENG 2003, Grenoble, France(2003).


We propose a new statistical model for the classification of structured documents and consider its use for multimedia document classification. Its main originality is its ability to simultaneously take into account the structural and the content information present in a structured document, and also to cope with different types of content (text, image, etc). We present experiments on the classification of multilingual pornographic HTML pages using text and image data. The system accurately classifies porn sites from 8 European languages. This corpus has been developed by EADS company in the context of a large Web site filtering application.

PDF - Requires Adobe Acrobat Reader or other PDF viewer.
EPrint Type:Conference or Workshop Item (Paper)
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Learning/Statistics & Optimisation
Information Retrieval & Textual Information Access
ID Code:438
Deposited By:Ludovic Denoyer
Deposited On:22 December 2004