PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Unsupervised segmentation of words into morphemes - Challenge 2005, An Introduction and Evaluation Report
Mikko Kurimo, Mathias Creutz, Matti Varjokallio, Ebru Arisoy and Murat Saraclar
In: PASCAL Challenge Workshop on Unsupervised segmentation of words into morphemes, April 12, 2006, Venice, Italy.


The objective of the challenge for the unsupervised segmentation of words into morphemes, or shorter the Morpho Challenge, was to design a statistical machine learning algorithm that segments words into the smallest meaning-bearing units of language, morphemes. Ideally, these are basic vocabulary units suitable for different tasks, such as speech and text understanding, machine translation, information retrieval, and statistical language modeling. The segmentations were evaluated in two complementary ways: Competition~1: The proposed morpheme segmentation were compared to a linguistic morpheme segmentation gold standard. Competition~2: Speech recognition experiments were performed, where statistical n-gram language models utilized the proposed word segments instead of entire words. Data sets were provided for three languages: Finnish, English, and Turkish. Participants were encouraged to apply their algorithm to all of these test languages.

PDF - Requires Adobe Acrobat Reader or other PDF viewer.
EPrint Type:Conference or Workshop Item (Oral)
Project Keyword:Project Keyword UNSPECIFIED
Subjects:User Modelling for Computer Human Interaction
Natural Language Processing
Information Retrieval & Textual Information Access
ID Code:2197
Deposited By:Mikko Kurimo
Deposited On:18 September 2006