PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Audio-Visual Group Recognition using Diffusion Maps
Yosi Keller, stephane lafon, ronald coifman and steven zucker
IEEE Transactions on Signal Processing Volume 58, Number 1, pp. 403-413, 2009.


Data fusion is a natural and common approach to recovering the state of physical systems. But the dissimilar appearance of different sensors remains a fundamental obstacle. We propose a unified embedding scheme for multi-sensory data, based on the spectral diffusion framework, which addresses this issue. Our scheme is purely data-driven and assumes no a priori statistical or deterministic models of the data sources. To extract the underlying structure we first embed separately each input channel; the resultant structures are then combined in diffusion coordinates. In particular, as different sensors sample similar phenomena with different sampling densities, we apply the density invariant Laplace-Beltrami embedding. This is a fundamental issue in multisensor acquisition and processing, overlooked in prior approaches. We extend previous work on group recognition, and suggest a novel approach to the selection of diffusion coordinates. To verify our approach, we demonstrate performance improvements in audio/visual speech recognition.

EPrint Type:Article
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Learning/Statistics & Optimisation
Multimodal Integration
ID Code:5830
Deposited By:Yosi Keller
Deposited On:08 March 2010