PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Semantics-based analysis and navigation of heterogeneous text corpora: the Porpoise news and blogs engine
Bettina Berendt and Daniel Trümper
In: Web Mining Applications in E-commerce and E-services (2009) Springer , Berlin etc. , pp. 45-64.

Abstract

Many information sites such as news services and search engines offer options beyond keyword search to help people group and identify relevant sources of information. However, their search options are limited to fixed and mostly syntactic criteria (like date or source identity) or semantic criteria that are opaque (“similar documents”). Also, these interfaces do not support the systematic comparing and contrasting of different information sources that is central to information literacy. In this paper, we describe Porpoise, a system that provides users with a toolkit for doing an in-depth, semantic analysis of a collection of archives of their choice. Users are able to analyse, compare and contrast corpora in “global” and in “local” fashion. Clustering, semi-automatic ontology learning, a new form of multidimensional nearest-neighbour search and visualisation help users to navigate a multilingual corpus of news and blogs by semantic criteria of their own design.

EPrint Type:Book Section
Additional Information:see http://www.cs.kuleuven.be/~berendt for access to an electronic version
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Information Retrieval & Textual Information Access
ID Code:6723
Deposited By:Bettina Berendt
Deposited On:08 March 2010