PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Using Term-matching Algorithms for the Annotation of Geo-services
Miha Grcar, Eva Klien and Blaz Novak
(2009) Discussion Paper. Springer.


This paper presents an approach to automating semantic annotation within service-oriented architectures that provide interfaces to databases of spatial-information objects. The automation of the annotation process facilitates the transition from the current state-of-the-art architectures towards semantically-enabled architectures. We see the annotation process as the task of matching an arbitrary word or term with the most appropriate concept in the domain ontology. The term matching techniques that we present are based on text mining. To determine the similarity between two terms, we first associate a set of documents [that we obtain from a Web search engine] with each term. We then transform the documents into feature vectors and thus transition the similarity assessment into the feature space. After that, we compute the similarity by training a classifier to distinguish between ontology concepts. Apart fromtextmining approaches,we also present an alternative technique, namely Google Distance, which proves less suitable for our task. The paper also presents the results of an extensive evaluation of the presented term matching methods which shows that these methods work best on synonymous nouns from a specific vocabulary. Furthermore, the fast and simple centroid-based classifier is shown to perform very well for this task. The main contribution of this paper is thus in proposing a term matching algorithm based on text mining and information retrieval. Furthermore, the presented evaluation should give a notion of how the algorithm performs in various scenarios.

EPrint Type:Monograph (Discussion Paper)
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Computational, Information-Theoretic Learning with Statistics
Information Retrieval & Textual Information Access
ID Code:6396
Deposited By:Jan Rupnik
Deposited On:08 March 2010