Multi-person visual focus of attention from head pose and meeting contextual cues
Sileye Ba and Jean-Marc Odobez
IEEE Trans. on Pattern Analysis and Machine Intelligence, accepted for publication
This paper introduces a novel contextual model for the recognition of people’s visual focus of attention (VFOA) estimation in meetings from audio-visual perceptual cues. More specifically, instead of independently recognizing the VFOA of each meeting participant from his own head pose, we propose to jointly recognize the participants’ visual attention in order to introduce context dependent interaction models that relates to group activity and the social dynamics of communication. Meeting contextual information is represented by the location of people, conversational events identifying floor holding patterns, and a presentation activity variable. By modeling the interactions between the different contexts and their combined and sometimes contradictory impact on the gazing behavior, our model allows to handle VFOA recognition in difficult task-based meetings in- volving artifacts, presentations, and moving people. We validated our model through rigorous evaluation on a publicly available and challenging dataset of 12 real meetings (five hours of data). The results demonstrated that the integration of the presentation and conversation dynamical context using our model can lead to significant performance improvements.