PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Generative vs Discriminative approaches to entity Recognition from label deficient data
Cyril Goutte, Eric Gaussier, Nicola Cancedda and Herve Dejean
In: JADT 2004, 7èmes journées internationales analyse statistique des données textuelles, March 10-12, 2004, Louvain-la-Neuve, Belgium.

Abstract

Annotating biomedical text for Named Entity Recognition (NER) is usually a tedious and expensive process, while unannotated data is freely available in large quantities. It therefore seems relevant to address biomedical NER using Machine Learning techniques that learn from a combination of labelled and unlabelled data. We consider two approaches: one is discriminative, using Support Vector Machines, the other generative, using mixture models. We compare the two on a biomedical NER task with various levels of annotation, and different similarity measures. We also investigate the use of Fisher kernels as a way to leverage the strength of both approaches. Overall the discriminative approach using standard similarity measures seems to out-perform both the generative approach and the Fisher kernels.

PDF - PASCAL Members only - Requires Adobe Acrobat Reader or other PDF viewer.
EPrint Type:Conference or Workshop Item (Paper)
Additional Information:http://www.xrce.xerox.com/Publications/Display-Abstract.php?ReportID=1191
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Learning/Statistics & Optimisation
Natural Language Processing
Information Retrieval & Textual Information Access
ID Code:551
Deposited By:Cyril Goutte
Deposited On:25 December 2004