Improving unsegmented dialogue turns annotation with n-gram transducers.
The statistical models used for dialogue systems need annotated data (dialogues) to infer their statistical parameters. Dialogues are usually annotated in terms of Dialogue Acts (DA). The annotation problem can be attacked with statistical models, that avoid annotating the dialogues from scratch. Most previous works on automatic statistical annotation assume that the dialogue turns are segmented into the corresponding meaningful units. However, this segmentation is not usually available. Most recent works tried the annotation with unsegmented turns using an extension of the models used in the segmented case, but they showed a dramatical decrease in their performance. In this work we propose an enhanced annotation technique based on N-gram transducers that outperforms the accuracy of the classical HMM-based model for annotation and segmentation of unsegmented turns.