PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

A lemmatization web service based on machine learning techniques
Joel Plisson, Dunja Mladenić, Nada Lavrac and Tomaz Erjavec
In: 2nd Language and Technology Conference, 21-23 April 2005, Poznan, Poland.

Abstract

Lemmatization is the process of finding the normalized form of words from surface word-forms as they appear in the running text. It is a useful pre-processing step for any number of language engineering tasks, esp. important for languages with rich inflection morphology. This paper presents two approaches to automated word lemmatization, which both use machine learning techniques to learn particular language models from pre-annotated data. One approach is based on Ripple Down Rules and the other on First-Order Decision Lists as learned by the CLog system. We have tested the two approaches on the Slovene language and set-up a generally accessible Web service for lemmatization using the generated models.

EPrint Type:Conference or Workshop Item (Paper)
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Natural Language Processing
Information Retrieval & Textual Information Access
ID Code:2330
Deposited By:Dunja Mladenić
Deposited On:22 November 2006