Posterior based keyword spotting with a priori thresholds
Hamed Ketabdar, Jithendra Vepa, Samy Bengio and Hervé Bourlard
In: International Conference on Spoken Language Processing, Interspeech-ICSLP(2006).
In this paper, we propose a new posterior based scoring approach for keyword and non keyword (garbage) elements. The estimation of these scores is based on HMM state posterior probability definition, taking into account long contextual information and the prior knowledge (e.g. keyword model topology). The state posteriors are then integrated into keyword and garbage posteriors for every frame. These posteriors are used to make a decision on detection of the keyword at each frame. The frame level decisions are then accumulated (in this case, by counting) to make a global decision on having the keyword in the utterance. In this way, the contribution of possible outliers are minimized, as opposed to the conventional Viterbi decoding approach which accumulates likelihoods. Experiments on keywords from the Conversational Telephone Speech (CTS) and Numbers'95 databases are reported. Results show that the new scoring approach leads to better trade off between true and false alarms compared to the Viterbi decoding approach, while also providing the possibility to precalculate keyword specific spotting thresholds related to the length of the keywords.
|EPrint Type:||Conference or Workshop Item (Oral)|
|Project Keyword:||Project Keyword UNSPECIFIED|
|Deposited By:||Samy Bengio|
|Deposited On:||22 November 2006|