PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Constructing information networks from textual documents
Matjaz Jursic, Nada Lavrac, Igor Mozetic, Vid Podpecan and Hannu Toivonen
In: Workshop on Explorative Analytics of Information Networks, Sep 2009, Bled, Slovenia.

Abstract

A major challenge for next generation data mining systems is creative knowledge discovery from diverse and distributed data/knowledge sources. In this task, an important challenge is information fusion of diverse representations into a unique data/knowledge format. This paper focuses on the graph representation of data/knowledge generated from text documents available on the web. The problem addressed is how to efficiently and effectively create an information network, named a BisoNet, from large text corpora. Several options concerning node and arc representation are discussed, and a case study information network is created from articles concerning autism, downloaded from the PubMed repository of medical publications. Open issues and lessons learned concerning representation choices are discussed.

EPrint Type:Conference or Workshop Item (Paper)
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Information Retrieval & Textual Information Access
ID Code:5917
Deposited By:Hannu Toivonen
Deposited On:08 March 2010