PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Unsupervised Morpheme Analysis Evaluation by a Comparison to a Linguistic Gold Standard -- Morpho Challenge 2008
Mikko Kurimo and Matti Varjokallio
In: Workshop of the Cross-Language Evaluation Forum, CLEF 2008, Aarhus, Denmark(2008).

Abstract

The goal of Morpho Challenge 2008 was to find and evaluate unsupervised algorithms that provide morpheme analyses for words in different languages. Especially in morphologically complex languages, such as Finnish, Turkish and Arabic, morpheme analysis is important for lexica l modeling of words in speech recognition, information retrieval and machine translation. The evaluation in Morpho Challenge competitions consisted of both a linguistic and an application oriented performance analysis. This paper describes an evaluation where the competition entries were compared to a linguistic morpheme analysis gold standard. Because the morpheme labels in an unsupervised analysis can be arbitrary, the evaluation is based on matching the morpheme-shar ing words between the proposed and the gold standard analyses. In addition to Finnish, Turkish, German and English evaluations performed in Morpho Challenge 2007, the competition this year h ad an additional evaluation in Arabic. The results in 2008 show that although the level of precision and recall varies substantially between the tasks in different la nguages, the best methods seem to manage all the tested languages quite well. The Morpho Challenge was part of the EU Network of Excellence PASCAL Challenge Program and organized in collaboration with CLEF.

PDF - Requires Adobe Acrobat Reader or other PDF viewer.
EPrint Type:Conference or Workshop Item (Paper)
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Computational, Information-Theoretic Learning with Statistics
Natural Language Processing
Theory & Algorithms
Information Retrieval & Textual Information Access
ID Code:4303
Deposited By:Mikko Kurimo
Deposited On:13 March 2009