PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

PicSOM Experiments in TRECVID 2006
Mats Sjöberg, Hannes Muurinen, Jorma Laaksonen and Markus Koskela
In: TRECVID 2006 Workshop, 13 Nov 2006, Gaithersburg, MD.

Abstract

Our experiments in TRECVID 2006 include participation in the shot boundary detection, high-level feature extraction, and search tasks, using a common system framework based on multiple parallel Self-Organizing Maps (SOMs). In the shot boundary detection task we projected feature vectors calculated from successive frames on parallel SOMs and monitored the trajectories to detect the shot boundaries.The trajectory-based method seemed to work comparatively well in the task. By comparing the F1 scores of the runs we found out that the results mostly degraded when using only a portion of the data in training. Especially the channel-specific detectors seemed to suffer from overfitting and did not work well probably because of the low amount of channel-specific training data compared to the number of adjustable parameters. In the high-level feature extraction task, we applied a method of representing semantic concepts as class models on a set of parallel SOMs, combined with an inverse file created from automated speech recognition and machine translation (ASR/MT) data. We observed increase in performance when adding both textual features and the auxiliary concepts to the visual features baseline. In the search task, we submitted a total of six runs (five automatic and one interactive run). Our method used SOM and inverse file indices from visual and textual features combined with class models of appropriate semantic concepts Using class-models created from the LSCOM concepts improved the retrieval performance as measured by MAP scores. Also the entity detection in the last automatic run proved successful and seems to be a promising topic for future experiments.

EPrint Type:Conference or Workshop Item (Poster)
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Machine Vision
Natural Language Processing
Speech
Information Retrieval & Textual Information Access
ID Code:2596
Deposited By:Jorma Laaksonen
Deposited On:14 February 2008