A Critical Survey of the Methodology for IE Evaluation
Alberto Lavelli, Mary Elaine Califf, Fabio Ciravegna, Dayne Freitag, Claudio Giuliano, Nicholas Kushmerick and Lorenza Romano
In: Proceedings of the 4th International Conference on Language Resources and Evaluation, May 26-28, 2004, Lisbon, POrtugal.
We survey the evaluation methodology adopted in Information Extraction (IE), as defined in the MUC conferences and in later independent
efforts applying machine learning to IE. We point out a number of problematic issues that may hamper the comparison between
results obtained by different researchers. Some of them are common to other NLP tasks: e.g., the difficulty of exactly identifying the
effects on performance of the data (sample selection and sample size), of the domain theory (features selected), and of algorithm parameter
settings. Issues specific to IE evaluation include: how leniently to assess inexact identification of filler boundaries, the possibility of
multiple fillers for a slot, and how the counting is performed. We argue that, when specifying an information extraction task, a number of
characteristics should be clearly defined. However, in the papers only a few of them are usually explicitly specified. Our aim is to elaborate
a clear and detailed experimental methodology and propose it to the IE community. The goal is to reach a widespread agreement
on such proposal so that future IE evaluations will adopt the proposed methodology, making comparisons between algorithms fair and
reliable. In order to achieve this goal, we will develop and make available to the community a set of tools and resources that incorporate
a standardized IE methodology.