PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

An efficient memory-based morphosyntactic tagger and parser for Dutch
Antal Van den Bosch, Bertjan Busser, Sander Canisius and Walter Daelemans
In: Computational Linguistics in the Netherlands 2006: Selected papers from the seventeenth CLIN meeting (2007) OTS , Utrecht, Netherlands , pp. 191-206.


We describe TADP OL E, a modular memory-based morphosyntactic tagger and dependency parser for Dutch. Though primarily aimed at being accurate, the design of the system is also driven by optimizing speed and memory usage, using a trie-based approximation of k-nearest neighbor classification as the basis of each module. We perform an evaluation of its three main modules: a part-of-speech tagger, a morphological analyzer, and a depen- dency parser, trained on manually annotated material available for Dutch – the parser is additionally trained on automatically parsed data. A global analysis of the system shows that it is able to process text in linear time close to an estimated 2,500 words per second, while maintaining sufficient accuracy.

EPrint Type:Book Section
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Natural Language Processing
ID Code:3894
Deposited By:Walter Daelemans
Deposited On:25 February 2008