PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

From the texts to the concepts they contain: a chain of linguistic treatments
Ahmed Amrani, Jérome Azé, Thomas Heitz, Yves Kodratoff and Mathieu Roche
In: The Thirteenth Text REtrieval Conference (TREC 2004), 16-19 December 2004, Gaithersburg, Washington DC.


The text-mining system we are building deals with the specific problem of identifying the instances of relevant concepts present in the texts. Therefore, our system relies on interaction between a field expert and the various linguistic modules we use, often adapted from existing ones, such as Brill's tagger or CMU's Link parser. We have developed learning procedures adapted to various steps of the linguistic treatment, mainly for grammatical tagging, terminology, and concept learning. Our interaction with the expert differs from classical supervised learning, in that the expert is not simply a resource who is only able to provide examples, and unable to provide the formalized knowledge underlying these examples. We are developing specific programming languages which enable the field expert to intervene directly in some of the linguistic tasks. Our approach is thus devoted to helping one expert in one field to detect the concepts relevant for his/her field, using a large amount of texts. Our approach is made off two steps. The first one is an automatic approach that find relevant and novel sentences in the texts. The second one is based on the expert's knoweldge and find more specific relevant sentences. Working on 50 different domains without an expert has been a challenge in itself, and explains our relatively poor results for the first task.

EPrint Type:Conference or Workshop Item (Paper)
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Natural Language Processing
ID Code:661
Deposited By:Jérome Azé
Deposited On:29 December 2004