PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Learning to Summarise XML Documents by Combining Content and Structure Features
Massih Amini, Tombros Anastasios, Nicolas Usunier, Mounia Lalmas and Patrick Gallinari
In: ACM Fourteenth Conference on Information and Knowledge Management (CIKM 2005), 31 Oct - 5 Nov 2005, Bremen, Germany.

Abstract

Documents formatted in eXtensible Markup Language (XML) are becoming increasingly available in collections of various document types. In this paper, we present an approach for the summarisation of XML documents. The novelty of this approach lies in that it is based on features not only from the content of documents, but also from their logical structure. We follow a sentence extraction-based summarisation method that employs on a novel machine learning approach. To find which feature are more effective for producing summaries this approach views sentence extraction as an ordering task. We evaluated our summarisation model using the INEX and SUMMAC datasets. The results demonstrate that the inclusion of features from the logical structure of documents increases the effectiveness of the summariser, and that the novel machine learning approach is also effective and well-suited to the task of summarisation in the context of XML documents. Our approach is generic and is therefore applicable to elements of varying granularity within the XML tree. We view these results as a step towards the intelligent summarisation of XML documents.

PDF - Requires Adobe Acrobat Reader or other PDF viewer.
EPrint Type:Conference or Workshop Item (Paper)
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Information Retrieval & Textual Information Access
ID Code:1073
Deposited By:Massih Amini
Deposited On:04 September 2005