Enhancing Audio Speech using Visual Speech Features
This work presents a novel approach to speech enhancement by exploiting the bimodality of speech and the correlation that exists between audio and visual speech features. For speech enhancement, a visually-derived Wiener filter is developed. This obtains clean speech statistics from visual features by modelling their joint density and making a maximum a posteriori estimate of clean audio from visual speech features. Noise statistics for the Wiener filter utilise an audio-visual voice activity detector which classifies input audio as speech or nonspeech, enabling a noisemodel to be updated. Analysis shows estimation of speech and noise statistics to be effective with human listening tests measuring the effectiveness of the resulting Wiener filter.