PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Unsupervised segmentation of words into morphemes - Challenge 2005, An Introduction and Evaluation Report
Mikko Kurimo, Mathias Creutz, Matti Varjokallio, Ebru Arisoy and Murat Saraclar
In: PASCAL Challenge Workshop on Unsupervised segmentation of words into morphemes, 12 Apr 2005, Venice, Italy.

Abstract

The objective of the challenge for the unsupervised segmentation of words into morphemes, or shorter the Morpho Challenge, was to design a statistical machine learning algorithm that segments words into the smallest meaning-bearing units of language, morphemes. Ideally, these are basic vocabulary units suitable for different tasks, such as speech and text understanding, machine translation, information retrieval, and statistical language modeling. The segmentations were evaluated in two complementary ways: Competition 1: The proposed morpheme segmentation were compared to a linguistic morpheme segmentation gold standard. Competition 2: Speech recognition experiments were performed, where statistical n-gram language models utilized the proposed word segments instead of entire words. Data sets were provided for three languages: Finnish, English, and Turkish. Participants were encouraged to apply their algorithm to all of these test languages.

EPrint Type:Conference or Workshop Item (Paper)
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Computational, Information-Theoretic Learning with Statistics
Natural Language Processing
Speech
ID Code:2393
Deposited By:Mathias Creutz
Deposited On:22 November 2006