PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Discrete versus probabilistic sequence classifiers for domain-specific entity chunking
Sander Canisius, Antal van den Bosch and Walter Daelemans
In: Proceedings of the Eighteenth Belgian-Dutch Conference on Artificial Intelligence, BNAIC-2006 (2006) BNVKI , Namur, Belgium , pp. 75-82.

Abstract

We present a comparative case study of discrete and probabilistic sequence classification methods applied to two real-world entity chunking tasks in the medical domain. It is shown that a discrete version of maximum-entropy models that does not coordinate its decisions is outperformed by both architecturally-augmented discrete versions, and probabilistic versions combined with an inference step to select the best output label sequence. In addition, we show that among the various sequence-aware methods evaluated in this study, be they discrete or probabilistic, no significant performance difference could be observed. This suggests that probabilistic sequence labelling methods are not fundamentally more suited for the type of sequence-oriented entity chunking tasks evaluated in this study than augmented discrete approaches. Future research should point out whether this result generalises to more types of sequence tasks in natural language processing.

EPrint Type:Book Section
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Natural Language Processing
Theory & Algorithms
ID Code:2947
Deposited By:Walter Daelemans
Deposited On:27 December 2006