A Deep Neural Network for Acoustic-Articulatory Speech Inversion
Benigno Uria, Steve Renals and Korin Richmond
In: NIPS 2011 Workshop on Deep Learning and Unsupervised Feature Learning, 16 Dec 2011, Sierra Nevada, Spain.
In this work, we implement a deep belief network for the acoustic-articulatory inversion mapping problem. We find that adding up to 3 hidden-layers improves inversion accuracy. We also show that this improvement is due to the higher expressive capability of a deep model and not a consequence of adding more adjustable parameters. Additionally, we show unsupervised pretraining of the system improves its performance in all cases, even for a 1 hidden-layer model. Our implementation obtained an average root mean square error of 0.95 mm on the MNGU0 test dataset, beating all previously published results.