PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

MUDOS-NG: Multi-document Summaries Using N-gram Graphs (Tech Report)
George Giannakopoulos, George Vouros and Vangelis Karkaletsis
(2010) Technical Report. arXiv.org.

Abstract

This report describes the MUDOS-NG summarization system, which applies a set of language-independent and generic methods for generating extractive summaries. The proposed methods are mostly combinations of simple operators on a generic character n-gram graph representation of texts. This work defines the set of used operators upon n-gram graphs and proposes using these operators within the multi-document summarization process in such subtasks as document analysis, salient sentence selection, query expansion and redundancy control. Furthermore, a novel chunking methodology is used, together with a novel way to assign concepts to sentences for query expansion. The experimental results of the summarization system, performed upon widely used corpora from the Document Understanding and the Text Analysis Conferences, are promising and provide evidence for the potential of the generic methods introduced. This work aims to designate core methods exploiting the n-gram graph representation, providing the basis for more advanced summarization systems.

PDF - Requires Adobe Acrobat Reader or other PDF viewer.
EPrint Type:Monograph (Technical Report)
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Natural Language Processing
Information Retrieval & Textual Information Access
ID Code:7599
Deposited By:George Giannakopoulos
Deposited On:17 March 2011