Unsupervised segmentation of words into morphemes - Challenge 2005, An Introduction and Evaluation Report
Mikko Kurimo, Mathias Creutz, Matti Varjokallio, Ebru Arisoy and Murat Saraclar
In: PASCAL Challenge Workshop on Unsupervised segmentation of words into morphemes, April 12, 2006, Venice, Italy.
The objective of the challenge for the unsupervised segmentation of
words into morphemes, or shorter the Morpho Challenge, was to design a statistical machine learning algorithm that segments words into the smallest meaning-bearing units of language, morphemes. Ideally, these are basic vocabulary units suitable for different tasks, such as speech and text understanding, machine translation, information retrieval, and statistical language modeling.
The segmentations were evaluated in two complementary ways:
Competition~1: The proposed morpheme segmentation were compared to a linguistic morpheme segmentation gold standard.
Competition~2: Speech recognition experiments were performed, where statistical n-gram language models utilized the proposed word segments instead of entire words.
Data sets were provided for three languages: Finnish, English, and Turkish.
Participants were encouraged to apply their algorithm to all of these test languages.