PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Audio-Visual Feature Extraction for Semi-Automatic Annotation of Meetings
Marian Kepesi, Michael Neffe, Tuan Van Pham, Michael Grabner, Helmut Grabner and Andreas Juffinger
In: International Workshop on Multimedia Signal Processing 2006, 03-06 Oct 2006, Victoria, BC, Canada.

Abstract

In this paper we present the building blocks of our semi-automatic annotation tool which supports multi-modal and multi-level annotation of meetings. The main focus is on the proper design and functionality of the modules for recognizing meeting actions. The key features, identity and position of the speakers, are provided by different modalities (audio and video). Three audio algorithms (Voice Activity Detection, Speaker Identification and Direction of Arrival) and three video algorithms (Detection, Tracking and Identification) form the low-level feature extraction components. Low-level features are automatically merged and the recognized actions are proposed to the user by visualizing them. The annotation labels are related but not limited to events during meetings. The user can finally confirm or if necessary, modify the suggestion, and then store the actions into a database.

EPrint Type:Conference or Workshop Item (Paper)
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Machine Vision
Speech
Multimodal Integration
ID Code:2544
Deposited By:Andreas Juffinger
Deposited On:22 November 2006