Unsupervised Morpheme Analysis Evaluation by a Comparison to a Linguistic Gold Standard -- Morpho Challenge 2008
Mikko Kurimo and Matti Varjokallio
In: Workshop of the Cross-Language Evaluation Forum, CLEF 2008, Aarhus, Denmark(2008).


The goal of Morpho Challenge 2008 was to find and evaluate unsupervised algorithms that provide morpheme analyses for words in different languages. Especially in morphologically complex languages, such as Finnish, Turkish and Arabic, morpheme analysis is important for lexica l modeling of words in speech recognition, information retrieval and machine translation. The evaluation in Morpho Challenge competitions consisted of both a linguistic and an application oriented performance analysis. This paper describes an evaluation where the competition entries were compared to a linguistic morpheme analysis gold standard. Because the morpheme labels in an unsupervised analysis can be arbitrary, the evaluation is based on matching the morpheme-shar ing words between the proposed and the gold standard analyses. In addition to Finnish, Turkish, German and English evaluations performed in Morpho Challenge 2007, the competition this year h ad an additional evaluation in Arabic. The results in 2008 show that although the level of precision and recall varies substantially between the tasks in different la nguages, the best methods seem to manage all the tested languages quite well. The Morpho Challenge was part of the EU Network of Excellence PASCAL Challenge Program and organized in collaboration with CLEF.

