Enhancing Speaker Recognition with Virtual Examples
Support vector machines (SVMs) combined with Gaussian mixture models (GMMs) using universal background models (UBMs) have recently emerged as the state-of-the-art approach to speaker recognition. Typically, linear kernel SVMs are defined in a space in which speakers are represented by supervectors. A supervector is formed by stacking the Maximum-A-Posteriori (MAP) adapted means of the UBM, given the speaker data, so that a whole speaker conversation is condensed into a single point in the supervector space. Due to the limitations in target, as opposed to impostor data, this framework leads to highly imbalanced training. Virtual examples (VEs) refers to the creation of artificial examples generated from the original labeled ones and is one of the proposed solutions to alleviate the imbalanced training problem. It has been successfully applied in tasks such as text and handwriting recognition. In this work, we present preliminary results obtained using VEs in the context of 4-wire and 2-wire speaker recognition.