PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Detecting events in a million New York Times articles
Tristan Snowsill, Ilias Flaounas, Tijl De Bie and Nello Cristianini
Lecture Notes in Computer Science Volume 6323, pp. 615-618, 2010.

Abstract

We present a demonstration of a newly developed text stream event detection method on over a million articles from the New York Times corpus. The event detection is designed to operate in a predominantly on-line fashion, reporting new events within a specified timeframe. The event detection is achieved by detecting significant changes in the statistical properties of the text where those properties are efficiently stored and updated in a suffix tree. This particular demonstration shows how our method is effective at discovering both short- and long-term events (which are often denoted topics), and how it automatically copes with topic drift on a corpus of 1 035 263 articles.

PDF - Requires Adobe Acrobat Reader or other PDF viewer.
EPrint Type:Article
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Computational, Information-Theoretic Learning with Statistics
Information Retrieval & Textual Information Access
ID Code:7256
Deposited By:Tristan Snowsill
Deposited On:14 March 2011