Semantics-based analysis and navigation of heterogeneous text corpora: the Porpoise news and blogs engine
Many information sites such as news services and search engines offer options beyond keyword search to help people group and identify relevant sources of information. However, their search options are limited to fixed and mostly syntactic criteria (like date or source identity) or semantic criteria that are opaque (“similar documents”). Also, these interfaces do not support the systematic comparing and contrasting of different information sources that is central to information literacy. In this paper, we describe Porpoise, a system that provides users with a toolkit for doing an in-depth, semantic analysis of a collection of archives of their choice. Users are able to analyse, compare and contrast corpora in “global” and in “local” fashion. Clustering, semi-automatic ontology learning, a new form of multidimensional nearest-neighbour search and visualisation help users to navigate a multilingual corpus of news and blogs by semantic criteria of their own design.