PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

PicSOM Experiments in TRECVID 2010
Mats Sjöberg, Markus Koskela, Milen Chechev and Jorma Laaksonen
In: TRECVID 2010 Workshop, 15 Nov - 17 Nov 2010, Gaithersburg, MD.

Abstract

Our experiments in TRECVID 2010 include participation in the semantic indexing and known-item search tasks. In the semantic indexing task we implemented SVM-based classifiers on five different low-level visual features extracted from the keyframes. In addition to the main keyframes provided by NIST, we also extracted and analysed additional frames from longer shots. The feature-wise classifiers were fused using standard and weighted geometric mean. We submitted the following four runs: PicSOM_geom: Geometric mean of five features, all keyframes. PicSOM_wgeom: Weighted geometric mean of five features, all keyframes. PicSOM_2geom-mkf: Geometric mean of two "best" features, main keyframe only. PicSOM_2geom-max: Geometric mean of two "best" features, all keyframes. The runs 2geom-max and wgeom obtained the highest MIAP scores (with essentially the same score, 0.0697 vs. 0.0694). Overall, using more keyframes always improved the results substantially. Our weighting approach improved the result over the standard geometric mean. However, by using only two features in fusion without weighting we achieved a similar result. In the known-item search task we submitted two automatic and two interactive runs: PicSOM_1: Text search + concept detectors with distribution PicSOM_2: Text search + concept detectors with rank PicSOM_3: Interactive with detail view ("normal") PicSOM_4: Interactive without detail view ("fast") Our automatic runs used text search with a single video-level index containing all the ASR text plus the title, description and subjects from the meta data. In addition we used automatic selection of concepts based on matching keywords in the query text. We tried two approaches for combining the concept detector outcomes with the text search results. They both recieved very similar scores (0.264 vs. 0.260 in mean reciprocal rank). Our interactive runs were with a very simple setup: the results of the PicSOM_1 automatic run were presented in order in a set of screens through which the user should browse to find the correct result. When a promising video was found, the user could examine a detailed view from which he or she could access the oracle service. We also tried a faster variant of the system where the user could make quicker decisions and use the oracle directly from the overview screen. The fast version received lower user satisfaction score (5.0 vs 6.0) but higher performance (0.455 vs 0.318 in mean reciprocal rank).

EPrint Type:Conference or Workshop Item (Paper)
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Machine Vision
Multimodal Integration
Information Retrieval & Textual Information Access
ID Code:7982
Deposited By:Markus Koskela
Deposited On:17 March 2011