Modélisation de documents combinant texte et image : application à la catégorisation et à la recherche d'information multimédia
Exploiting multimedia documents leads to representation problems of the textual and visual information within documents. Our goal is to propose a model to represent these both information and to combine them for two tasks: categorization and information retrieval. This model represents documents as bags of words, which requires to define adapted vocabularies. The textual vocabulary, usually very large, corresponds to the words of documents while the visual one is created by extracting low-level features from images. We study the different steps of its creation and the tf.idf weighting of visual words in images usually used for textual words. In the context of the text categorization, we introduce a criterion to select the most discriminative words for categories in order to reduce the vocabulary size without degrading the results of classification. We also present in the multilabel context, a method that lets us to select the number of categories which must be associated with a document. In multimedia information retrieval, we propose an analytical approach based on machine learning techniques to linearly combine the results from textual and visual information which significantly improves research results. Our model has shown its efficiency on different collections of important size and was evaluated in several international competitions such as XML Mining and ImageCLEF.