On the Potential for Robust ASR with Combined Subband-Waveform and Cepstral Features
J K Yousafzai, Z Cvetkovic and P Sollich
In: IEEE International Symposium on Information Theory 2011(2011).
This work explores the potential for robust classification of phonemes in the presence of additive noise and linear filtering using high-dimensional features in the subbands of acoustic waveforms. The proposed technique is compared with state-of-the-art automatic speech recognition (ASR) front-ends on the TIMIT phoneme classification task using support vector machines (SVMs). The key issues of selecting the appropriate SVM kernels for classification in frequency subbands and the combination of individual subband classifiers using ensemble methods are addressed. Experiments demonstrate the benefits of the classification in the subbands of acoustic waveforms: it outperforms the standard cepstral front-end in the presence of noise and linear filtering for all signal-to-noise ratios (SNRs) below a crossover point between 12dB and 6dB. Combining the subband-waveform and cepstral classifiers achieves further performance improvements over both individual classifiers.