Phoneme Alignment Based on Discriminative Learning
Joseph Keshet, Shai Shalev-Shwartz and Yoram Singer
In: Interspeecj 2005, 4-8 Sep 2005, Lisbon.
We propose a novel paradigm for aligning a phoneme sequence of a speech utterance with its acoustical signal counterpart. Unlike the traditional HMM-based approaches, our method utilizes a discriminative learning procedure in which the learning phase is tightly coupled with the decision task one needs to perform. The alignment function we devise is based on mapping the input acoustic-symbolic representations of the speech utterance along with the target alignment into an abstract vector space. We suggest a specific mapping into the abstract vector-space which utilizes standard speech features (e.g. spectral distances) as well as confidence outputs of a framewise phoneme classifier. Building on techniques used for large margin strings prediction, our alignment function distills to a classifier in the abstract vector-space which separates correct alignments from incorrect ones. We describe a simple iterative algorithm for learning the alignment function and discuss its formal properties. Experiments with the TIMIT corpus show that our method achieves state-of-the-art results.
|EPrint Type:||Conference or Workshop Item (Paper)|
|Project Keyword:||Project Keyword UNSPECIFIED|
|Deposited By:||Shai Shalev-Shwartz|
|Deposited On:||26 August 2005|