Towards Using Hierarchical Posteriors for Flexible Automatic Speech Recognition Systems
Hervé Bourlard, Samy Bengio, Mathew Magimai Doss, Qifeng Zhu, Bertrand Mesot and Nelson Morgan
IDIAP Research Institute.
Local state (or phone) posterior probabilities are often
investigated as local classifiers (e.g., hybrid HMM/ANN systems)
or as transformed acoustic features (e.g., ``Tandem'') towards
improved speech recognition systems. In this paper, we present
initial results towards boosting these approaches by improving the
local state, phone, or word posterior estimates, using all
possible acoustic information (as available in the whole
utterance), as well as possible prior information (such as
topological constraints). Furthermore, this approach results in a
family of new HMM based systems, where only (local and global)
posterior probabilities are used, while also providing a new,
principled, approach towards a hierarchical use/integration of
these posteriors, from the frame level up to the sentence level.
Initial results on several speech (as well as other multimodal)
tasks resulted in significant improvements. In this paper, we
present recognition results on Numbers'95 and on a reduced
vocabulary version (1000 words) of the DARPA Conversational
Telephone Speech-to-text (CTS) task.