PicSOM experiments in TRECVID 2008
Our experiments in TRECVID 2008 include participation in the high-level feature extraction, automatic search, video summarization, and video copy detection tasks, using a common system framework. In the high-level feature extraction task, we extended our last year’s experiments, which were based on SOM-based semantic concept modeling followed by a post-processing stage utilizing the concepts’ temporal and inter-concept co-occurrences. We also studied the effects of a more comprehensive feature selection and the inclusion of audio features and face detection. We submitted the following six runs: • A_PicSOM_1_6: Visual features, baseline feature selection • A_PicSOM_2_2: Visual features, baseline feature selection, temporal context • A_PicSOM_3_5: Visual features, extended feature selection • A_PicSOM_4_4: All features, extended feature selection • A_PicSOM_5_3: All features, extended feature selection, PRF • A_PicSOM_6_1: All features, extended feature selection, temporal context The results show that a more comprehensive feature selection can be useful, and that the temporal and inter-concept co-occurrence analysis has the potential to improve the performance if good concept-wise post-processors can be chosen. The use of audio features and face detection resulted in minor improvements. In the search task, we again concentrated on the fully-automatic runs. We combined ASR/MT text search and concept-based retrieval. If none of the concept models could be matched with the query, we used content-based retrieval based on the video and image examples instead. We also experimented with topic-wise feature selection and the addition of face detection and motion-based features. We submitted the following six fully-automatic runs: • F_A_1_PicSOM_1_6: Required text search baseline • F_A_1_PicSOM_2_5: Alternative visual baseline, only examples • F_A_1_PicSOM_3_4: Alternative visual baseline, examples or concepts • F_A_2_PicSOM_4_3: Text search + visual examples or concepts • F_A_2_PicSOM_5_2: Text search + visual examples or concepts with feature selection • F_A_2_PicSOM_6_1: Text search + visual examples or concepts with feature selection + additional features The results show that the combination of concept-based retrieval and text search performed better than any of the single modalities in the baseline runs. Concept-based feature selection and additional features, however, degraded the average results. In BBC rushes summarization, we submitted one run which extended our last year’s approach consisting of initial shot boundary detection followed by shot content analysis and similarity assessment and pruning. We included new detectors for frames containing clapper boards, three different motion detectors, and a speech detector. The results of our summarization run are quite close to the median in the fraction of ground-truth inclusions found and in redundancy, with somewhat shorter average duration than the median. Our run’s performance was above the median on the amount of junk present and on tempo/rhythm. For video copy detection, we submitted some preliminary experiments based on our algorithm for shot similarity determination in video summarization. We used only the video modality.