Inductive Improvement of Part-of-Speech Tagging and its Effect on a Terminology of Molecular Biology
In the context of PoS-tagging of specialized corpora, we proposed an approach to deal with the most difficult grammatical tag ambiguities. Following the standard tagging of a biological corpus by Brill’s tagger, we noted persistent errors that are very hard to deal with. As an application, the cases we studied were of two nature: first, confusion between past participle, adjective and past tense for verbs that end with « ed »., secondly, confusion between plural names and verbs, 3rd person singular present. With a friendly user interface, the expert can correct examples. Than, on these well-annotated examples, we induced rules, using a propositional rule induction algorithm PART. Experimental validation showed improvement in tagging precision. We showed also that the tagging by our tagger ETIQ is more suitable to obtain a good quality and relevance of terminology in Molecular Biology in view to extract relevant information.