PicSOM experiments in TRECVID 2005
Our experiments in TRECVID 2005 include participation in the high-level feature extraction and search tasks. In the high- level feature extraction task, we applied a method of representing semantic concepts as class models on a set of parallel Self-Organizing Maps (SOMs). We submitted one run, A PicSOM 1, in which we applied a feature selection scheme for each concept separately. The results showed that the SOM-based class models can be used for representing semantic concepts on multimodal feature indices and that the proposed method is suitable for detecting video shots with specific semantic content. In the search task, we submitted a total of seven runs (three automatic, three manual, and one interactive run). Our main motivation was to study the utilization of parallel multimodal features and class models compared to using only text-based queries. The overall settings for the runs were as follows: F A 1 SOM-F1 7: a baseline automatic run using only ASR/MT output F A 2 SOM-F2 3: an automatic run using ASR/MT output, multimodal features, and class models F A 2 SOM-F3 5: an automatic run using multimodal features and class models M A 1 SOM-M1 6: a baseline manual run using only ASR/MT output M A 2 SOM-M2 4: a manual run using ASR/MT output and multimodal features M A 2 SOM-M3 2: a manual run using ASR/MT output, multimodal features, and class models I A 2 SOM-I 1: an interactive run Both in the automatic and manual experiments, we observed that the proposed method is able to combine the text query, multimodal features and class models successfully. In both cases, the overall best results are obtained using all three information sources with the MAP value being nearly double when compared to text-only search. Our small-scale interactive search experiments were performed with our prototype retrieval interface supporting only relevance feedback -based retrieval. Still, the experiments demonstrate that the proposed method can also be used in an interactive setting, where the search is guided with iterative feedback from the user.