PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Assessment of NER solutions against the first and second CALBC Silver Standard Corpus
D. Rebholz-Schuhmann, A. Jimeno, Ch. Li, S. Kafkas, I. Lewin, N. Kang, P. Corbett, D. Milward, E. Buyko, E. Beisswanger, K. Hornbostel, A. Kouznetsov, R. Witte, J.B. Laurila, Ch. JO Baker, Ch. Kuo, S. Clematide, F. Rinaldi, R. Farkas, G. Móra, K. Hara, L. Furlong, M. Rautschka, M. Lara Neves, A. Pascual-Montano, Q. Wei, N. Collier, F. Mahbub Chowdhury, A. Lavelli, R. Berlanga, Roser Morante, Vincent Van Asch, Walter Daelemans, J.L. Marina, E. van Mulligen, J. Kors and U. Hahn
Journal of Biomedical Semantics Volume 2, Number Suppl5, 2011. ISSN 2041-1480


Competitions in text mining have been used to measure the performance of automatic text processing solutions against a manually annotated gold standard corpus (GSC). The preparation of the GSC is time-consuming and costly and the final corpus consists at the most of a few thousand documents annotated with a limited set of semantic groups. To overcome these shortcomings, the CALBC project partners (PPs) have produced a large-scale annotated biomedical corpus with four different semantic groups through the harmonisation of annotations from automatic text mining solutions, the first version of the Silver Standard Corpus (SSC-I). The four semantic groups are chemical entities and drugs (CHED), genes and proteins (PRGE), diseases and disorders (DISO) and species (SPE). This corpus has been used for the First CALBC Challenge asking the participants to annotate the corpus with their text processing solutions.

EPrint Type:Article
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Natural Language Processing
ID Code:9061
Deposited By:Roser Morante
Deposited On:21 February 2012