PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Single-Channel Speech Separation using Sparse Non-Negative Matrix Factorization
M. N. Schmidt and R. K. Olsson
In: InterSpeech 2006, Pittsburgh(2006).


We apply machine learning techniques to the problem of separating multiple speech sources from a single microphone recording. The method of choice is a sparse non-negative matrix factorization algorithm, which in an unsupervised manner can learn sparse representations of the data. This is applied to the learning of personalized dictionaries from a speech corpus, which in turn are used to separate the audio stream into its components. We show that computational savings can be achieved by segmenting the training data on a phoneme level. To achieve the data split, a conventional speech recognizer is used. The performance of the unsupervised and supervised adaptation schemes result in significant improvements in term of the target-to-masker ratio.

EPrint Type:Conference or Workshop Item (Poster)
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Learning/Statistics & Optimisation
ID Code:2692
Deposited By:Rasmus Olsson
Deposited On:22 November 2006