PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Learning Interestingness Measures in Terminology Extraction. A ROC-based approach
Mathieu Roche, Jérome Azé, Yves Kodratoff and Michele Sebag
In: ROC Analysis in Artificial Intelligence (ROCAI 2004), 22 August 2004, Valencia, Spain.


In the field of Text Mining, a key phase in data preparation is concerned with the extraction of terms, i.e. collocation of words attached to specific concepts (e.g. Philosophy-Dissertation). In this paper, Term Extraction is formalized as a supervised learning task, extracting a ranking hypothesis from a set of terms labeled as relevant/irrelevant by the expert. This task is tackled using the evolutionary algorithm ROGER, optimizing the area under the ROC curve attached to a ranking hypothesis. Empirical validation on two real-world applications demonstrates outstanding improvements compared to state-of-the-art interestingness measures in Term Extraction. The approach is found robust across domains (Molecular Biology, Curriculum Vitae) and languages (English, French).

EPrint Type:Conference or Workshop Item (Paper)
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Learning/Statistics & Optimisation
Natural Language Processing
ID Code:643
Deposited By:Mathieu Roche
Deposited On:29 December 2004