PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Inductive Improvement of Part-of-Speech Tagging and its Effect on a Terminology of Molecular Biology
Ahmed Amrani, Mathieu Roche, Yves Kodratoff and Oriane Matte-Tailliez
In: AI 2005 (Canadian Conference on Artificial Intelligence), 9-11 May, Canada.

Abstract

In the context of PoS-tagging of specialized corpora, we proposed an approach to deal with the most difficult grammatical tag ambiguities. Following the standard tagging of a biological corpus by Brill’s tagger, we noted persistent errors that are very hard to deal with. As an application, the cases we studied were of two nature: first, confusion between past participle, adjective and past tense for verbs that end with « ed »., secondly, confusion between plural names and verbs, 3rd person singular present. With a friendly user interface, the expert can correct examples. Than, on these well-annotated examples, we induced rules, using a propositional rule induction algorithm PART. Experimental validation showed improvement in tagging precision. We showed also that the tagging by our tagger ETIQ is more suitable to obtain a good quality and relevance of terminology in Molecular Biology in view to extract relevant information.

EPrint Type:Conference or Workshop Item (Paper)
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Natural Language Processing
ID Code:1811
Deposited By:Mathieu Roche
Deposited On:28 November 2005