PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Automatic annotation of unique locations from video and text
Chris Engels, Koen Deschacht, Jan-Hendrik Becker, Tinne Tuytelaars, Marie-Francine Moens and Luc Van Gool
In: 21st British machine vision conference - BMVC 2010, 31 Aug - 3 Sept 2010, Aberystwyth, UK.


Given a video and associated text, we propose an automatic annotation scheme in which we employ a latent topic model to generate topic distributions from weighted text and then modify these distributions based on visual similarity. We apply this scheme to location annotation of a television series for which transcripts are available. The topic distributions allow us to avoid explicit classification, which is useful in cases where the exact number of locations is unknown. Moreover, many locations are unique to a single episode, making it impossible to obtain representative training data for a supervised approach. Our method first segments the episode into scenes by fusing cues from both images and text. We then assign location-oriented weights to the text and generate topic distributions for each scene using Latent Dirichlet Allocation. Finally, we update the topic distributions using the distributions of visually similar scenes. We formulate our visual similarity between scenes as an Earth Mover’s Distance problem. We quantitatively validate our multi-modal approach to segmentation and qualitatively evaluate the resulting location annotations. Our results demonstrate that we are able to generate accurate annotations, even for locations only seen in a single episode.

PDF - Requires Adobe Acrobat Reader or other PDF viewer.
EPrint Type:Conference or Workshop Item (Paper)
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Machine Vision
Multimodal Integration
ID Code:7888
Deposited By:Tinne Tuytelaars
Deposited On:17 March 2011