Learning Sub-structures of Document Semantic Graphs
for Document Summarization
Jure Leskovec, Marko Grobelnik and Natasa Milic-Frayling
In: LinkKDD 2004, 22-25 Aug - 2004, Seattle.
In this paper we present a method for summarizing document by
creating a semantic graph of the original document and
identifying the substructure of such a graph that can be used to
extract sentences for a document summary. We start with deep
syntactic analysis of the text and, for each sentence, extract
logical form triples, subject–predicate–object. We then apply
cross-sentence pronoun resolution, co-reference resolution, and
semantic normalization to refine the set of triples and merge them
into a semantic graph. This procedure is applied to both
documents and corresponding summary extracts. We train linear
Support Vector Machine on the logical form triples to learn how
to extract triples that belong to sentences in document summaries.
The classifier is then used for automatic creation of document
summaries of test data. Our experiments with the DUC 2002 data
show that increasing the set of attributes to include semantic
properties and topological graph properties of logical triples
yields statistically significant improvement of the micro-average
F1 measure for the extracted summaries. We also observe that
attributes describing various aspects of semantic graph are
weighted highly by SVM in the learned model.