Unsupervised segmentation of words into morphemes - Challenge 2005, An Introduction and Evaluation Report
The objective of the challenge for the unsupervised segmentation of words into morphemes, or shorter the Morpho Challenge, was to design a statistical machine learning algorithm that segments words into the smallest meaning-bearing units of language, morphemes. Ideally, these are basic vocabulary units suitable for different tasks, such as speech and text understanding, machine translation, information retrieval, and statistical language modeling. The segmentations were evaluated in two complementary ways: Competition 1: The proposed morpheme segmentation were compared to a linguistic morpheme segmentation gold standard. Competition 2: Speech recognition experiments were performed, where statistical n-gram language models utilized the proposed word segments instead of entire words. Data sets were provided for three languages: Finnish, English, and Turkish. Participants were encouraged to apply their algorithm to all of these test languages.