PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Effective visually-derived Wiener filtering for audio-visual speech processing
Ibrahim Almajai and Ben Milner
In: AVSP 2009, 10 Sep 2009, Norwich, UK.


This work presents a novel approach to speech enhancement by exploiting the bimodality of speech and the correlation that exists between audio and visual speech features. For speech enhancement, a visually-derived Wiener filter is developed. This obtains clean speech statistics from visual features by modelling their joint density and making a maximum a posteriori estimate of clean audio from visual speech features. Noise statistics for the Wiener filter utilise an audio-visual voice activity detector which classifies input audio as speech or nonspeech, enabling a noise model to be updated. Analysis shows estimation of speech and noise statistics to be effective with speech quality assessed objectively and subjectively measuring the effectiveness of the resulting Wiener filter. The use of this enhancement method is also considered for ASR purposes.

EPrint Type:Conference or Workshop Item (Oral)
Project Keyword:Project Keyword UNSPECIFIED
Multimodal Integration
ID Code:5615
Deposited By:Ibrahim Almajai
Deposited On:08 March 2010