PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Crossing textual and visual content in different application scenarios
Julien Ah-Pine, Marco Bressan, Stephane Clinchant, Gabriela Csurka, Yves Hoppenot and Jean-Michel Renders
Multimedia Tools and Applications Volume 41, Number 1, pp. 31-56, 2009. ISSN 1380-7501 (Print) 1573-7721 (Online)

Abstract

This paper deals with multimedia information access. We propose two new approaches for hybrid text-image information processing that can be straightforwardly generalized to the more general multimodal scenario. Both approaches fall in the trans-media pseudo-relevance feedback category. Our first method proposes using a mixture model of the aggregate components, considering them as a single relevance concept. In our second approach, we define trans-media similarities as an aggregation of monomodal similarities between the elements of the aggregate and the new multimodal object. We also introduce the monomodal similarity measures for text and images that serve as basic components for both proposed trans-media similarities. We show how one can frame a large variety of problem in order to address them with the proposed techniques: image annotation or captioning, text illustration and multimedia retrieval and clustering. Finally, we present how these methods can be integrated in two applications: a travel blog assistant system and a tool for browsing the Wikipedia taking into account the multimedia nature of its content.

EPrint Type:Article
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Multimodal Integration
Information Retrieval & Textual Information Access
ID Code:5303
Deposited By:Gabriela Csurka
Deposited On:24 March 2009