PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

A Deep Neural Network for Acoustic-Articulatory Speech Inversion
Benigno Uria, Steve Renals and Korin Richmond
In: NIPS 2011 Workshop on Deep Learning and Unsupervised Feature Learning, 16 Dec 2011, Sierra Nevada, Spain.


In this work, we implement a deep belief network for the acoustic-articulatory inversion mapping problem. We find that adding up to 3 hidden-layers improves inversion accuracy. We also show that this improvement is due to the higher expressive capability of a deep model and not a consequence of adding more adjustable parameters. Additionally, we show unsupervised pretraining of the system improves its performance in all cases, even for a 1 hidden-layer model. Our implementation obtained an average root mean square error of 0.95 mm on the MNGU0 test dataset, beating all previously published results.

PDF - Requires Adobe Acrobat Reader or other PDF viewer.
EPrint Type:Conference or Workshop Item (Paper)
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Learning/Statistics & Optimisation
ID Code:8821
Deposited By:Benigno Uria
Deposited On:21 February 2012