PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

PicSOM Experiments in TRECVID 2009
Mats Sjöberg, Ville Viitaniemi, Markus Koskela and Jorma Laaksonen
In: TRECVID 2009 Workshop, 16 Nov - 17 Nov 2009, Gaithersburg, MD.


Our experiments in TRECVID 2009 include participation in the high-level feature extraction and automatic search tasks. In the high-level feature extraction task, we used a feature fusion-based general system architecture utilizing a large number of SVM detectors, followed by a post-processing stage utilizing the concepts' temporal and inter-concept co-occurrences. We submitted the following six runs: PicSOM.base: Baseline run using our SOM-based HLF detection method. PicSOM.A-ngram: Baseline SVM-based run using HLF-wise geometric mean fusion and temporal n-gram post-processing. PicSOM.B-ngram: As previous, but includes also early fusion, multi-fold SFBS fusion, and more elaborate SVM training. PicSOM.E-ngram: As previous, but includes two-stage fusion utilizing cross-concept co-occurrence PicSOM.spec-ngram: A run where the used method was selected for each HLF separately using cross-validation. PicSOM.spec-any: As previous, but the post-processing used also clustering-based inter-concept co-occurrence analysis. The results show that feature fusion can consistently outperform all single features, multi-fold SFBS performed best of the tested fusion methods, and that temporal n-gram analysis is beneficial. Early fusion, and post-processing based on inter-concept co-occurrences did not improve the performance. In the search task, we concentrated on the fully-automatic runs and standard search task. We combined ASR/MT text search and concept-based retrieval. If none of the concept models could be matched with the query, we used content-based retrieval based on the video and image examples instead. We submitted the following ten fully-automatic runs: F_A_N_PicSOM_1_10: text search baseline. F_A_N_PicSOM_2_9: visual baseline. F_A_N_PicSOM_3_8: own concepts. F_A_N_PicSOM_4_7: own concepts + text search. F_A_N_PicSOM_5_6: donated concepts. F_A_N_PicSOM_6_5: donated concepts + text search. F_A_N_PicSOM_7_4: own + donated concepts. F_A_N_PicSOM_8_3: own + donated concepts + text search. F_A_N_PicSOM_9_2: own + donated (dupl.) concepts. F_A_N_PicSOM_10_1: own + donated (dupl.) concepts + text search. In the above list, ``own'' concepts refer to our own HLF detectors and ``donated'' concepts consist of MediaMill (MM) concepts + CU-VIREO374 concepts. In other than the last two runs, CU-VIREO374 are only used for words for which no MediaMill concept could be matched. The results show again that concept-based retrieval performed better than content-based search alone. Text search made a small improvement in combination with other modalities, but performed really badly on its own. Concept-selection was done both with word-matching and example-based matching, i.e. selecting concepts based on how well they would fit our own concept models.

EPrint Type:Conference or Workshop Item (Paper)
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Machine Vision
Multimodal Integration
Information Retrieval & Textual Information Access
ID Code:6676
Deposited By:Markus Koskela
Deposited On:08 March 2010