PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Assessment of NER solutions against the first and second CALBC Silver Standard Corpus
Dietrich Rebholz-Schuhmann, Antonio Jimeno Yepes, Chen Li, Senay Kafkas, Ian Lewin, Ning Kang, Peter Corbett, David Milward, Ekaterina Buyko, Elena Beisswanger, Kerstin Hornbostel, Alexandre Kouznetsov, René Witte, Jonas B. Laurila, Christopher JO Baker, Cheng-Ju Kuo, Simone Clematide, Fabio Rinaldi, Richárd Farkas, Györgi Móra, Kazuo Hara, Laura Furlong, Michael Rautschka, Mariana Lara Neves, Alberto Pascual-Montano, Qi Wei, Nigel Collier, Md. Faisal Mahbub Chowdhury, Alberto Lavelli, Rafael Berlanga, Roser Morante, Vincent Van Asch, Walter Daelemans, José Luis Marina, Erik van Mulligen, Jan Kors and Udo Hahn
In: 4th Symposium on Semantic Mining in Biomedicine (SMBM), 25'26 October 2010, Hinxton, UK.

Abstract

Text mining challenges have been organised to measure the performance of automatic text mining solutions against a manually annotated gold standard corpus (GSC). The preparation of the GSC is timeconsuming and costly and the final corpus consists at the most of a few thousand documents annotated with a limited set of semantic groups. To overcome these shortcomings, the CALBC project partners (PPs) have produced a large-scale annotated biomedical corpus with four different semantic groups through the harmonisation of annotations from automatic text mining solutions, the first version of the Silver Standard Corpus (SSC-I). The four semantic groups were chemical entities and drugs (CHED), genes and proteins (PRGE), diseases and disorders (DISO) and species (SPE). This corpus has been used for the First CALBC Challenge asking the participants to annotate the corpus with their annotation solutions.

EPrint Type:Conference or Workshop Item (Paper)
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Natural Language Processing
ID Code:7642
Deposited By:Roser Morante
Deposited On:17 March 2011