Detecting Interest-Level in Meetings
Daniel Gatica-Perez, Iain McCowan, Dong Zhang and Samy Bengio
In: IEEE International Conference on Acoustic, Speech, and Signal Processing (ICASSP)(2005).
Finding relevant segments in meeting recordings is important for
summarization, browsing, and retrieval purposes. In this paper, we
define relevance as the interest-level that meeting participants
manifest as a group during the course of their interaction (as
perceived by an external observer), and investigate the automatic
detection of segments of high-interest from audio-visual cues.
This is motivated by the assumption that there is a relationship
between segments of interest to participants, and those of
interest to the end user, e.g. of a meeting browser. We first
address the problem of human annotation of group interest-level.
On a 50-meeting corpus, recorded in a room equipped with multiple
cameras and microphones, we found that the annotations generated
by multiple people exhibit a good degree of consistency, providing
a stable ground-truth for automatic methods. For the automatic
detection of high-interest segments, we investigate a methodology
based on Hidden Markov Models (HMMs) and a number of audio and
visual features. Single- and multi-stream approaches were studied.
Using precision and recall as performance measures, the results
suggest that the automatic detection of group interest-level
is promising, and that while audio in general constitutes the
predominant modality in meetings, the use of a multi-modal
approach is beneficial.