PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Using DMoz for constructing ontology from data stream
Marko Grobelnik, Janez Brank, Dunja Mladenić, Blaz Novak and Blaz Fortuna
In: International Conference on Information Technology Interfaces, 19-22 Jun 2006, Cavtat,Dubrovnik, Croatia.

Abstract

This paper presents an approach for constructing an ontology from a stream of documents. Named entities extracted from the documents are used as instances of the ontology. Entities and co-occurring entity pairs are represented by feature vectors based on the content of the documents where they occurred. In general, concepts and relations can be formed into an ontological structure either by clustering or by classification into an existing topic hierarchy. We propose the latter using DMoz as an existing topic hierarchy. The approach is efficient and can scale to large data sets. We propose a framework that incorporates the stream mining process into a formal definition of the ontology. We describe a software component implementing this approach, and present experiments using a large collection of news.

EPrint Type:Conference or Workshop Item (Paper)
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Learning/Statistics & Optimisation
Information Retrieval & Textual Information Access
ID Code:2332
Deposited By:Dunja Mladenić
Deposited On:22 November 2006