PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

A Large Margin Algorithm for Speech and Audio Segmentation
Joseph Keshet, Shai Shalev-Shwartz, Yoram Singer and Dan Chazan
IEEE Transaction on Audio, Speech and Language Processing 2006.

Abstract

We describe and analyze a discriminative algorithm for learning to segment an audio signal given a sequence of events that tags the signal. We demonstrate the applicability of our method through the tasks of speech phoneme segmentation and music-to-score alignment. In the former task, the events that tag the speech signal are phonemes and in the latter task, the events are musical notes. Our goal is to learn a segmentation function whose input is an audio signal along with its accompanying event sequence and its output is a timing sequence representing the actual start time of each event in the audio signal. Generalizing the notion of separation with a margin used in support vector machines (SVM) for binary classification, we cast the learning task as the problem of finding a direction vector in an abstract vector-space. To do so, we devise a mapping of the input signal and the event sequence along with any possible timing sequence into an abstract vector-space. Thus, each possible timing sequence corresponds to a vector in the vector-space, and the predicted timing sequence is the one whose projection onto a direction vector in this vector-space is maximal. We set the direction vector to be the solution of a minimization problem with a large set of constraints. Each constraint enforces a gap between the projection of the correct target timing sequence and the projection of an alternative, incorrect, timing sequence onto the direction vector. Despite the large number of constraints, we provide a simple iterative algorithm for efficiently learning the direction vector and analyze the formal properties of the resulting learning algorithm. We experiment with our learning algorithm in applications of phonetic segmentation and music-to-score alignment by comparing its performance to the results obtained by a generative hidden Markov model (HMM) for segmentation. Our experiments indicate that the discriminative algorithm significantly outperforms the commonly used HMM-based approach.

PDF - PASCAL Members only - Requires Adobe Acrobat Reader or other PDF viewer.
EPrint Type:Article
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Learning/Statistics & Optimisation
Natural Language Processing
Speech
Theory & Algorithms
ID Code:2138
Deposited By:Joseph Keshet
Deposited On:05 July 2006