PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Delia Rusu, Blaz Fortuna, Marko Grobelnik and Dunja Mladenic
In: SiKDD 2008, 17 Oct 2008, Ljubljana, Slovenia.


Information nowadays has become more and more accessible, so much as to give birth to an information overload issue. Yet important decisions have to be made, depending on the available information. As it is impossible to read all the relevant content that helps one stay informed, a possible solution would be condensing data and obtaining the kernel of a text by automatically summarizing it. We present an approach to analyzing text and retrieving valuable information in the form of a semantic graph based on subject-verb-object triplets extracted from sentences. Once triplets have been generated, we apply several techniques in order to obtain the semantic graph of the document: coreference and anaphora resolution of named entities and semantic normalization of triplets. Finally, we describe the automatic document summarization process starting from the semantic representation of the text. The experimental evaluation carried out step by step on several Reuters newswire articles shows a comparable performance of the proposed approach with other existing methodologies. For the assessment of the document summaries we utilize an automatic summarization evaluation package, so as to show a ranking of various summarizers.

EPrint Type:Conference or Workshop Item (Paper)
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Natural Language Processing
Information Retrieval & Textual Information Access
ID Code:4970
Deposited By:Jan Rupnik
Deposited On:24 March 2009