PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Language Pragmatics, Contexts and a Search Engine
Ville Tuulos and Tomi Silander
Proceedings of the International and Interdisciplinary Conference on Adaptive Knowledge Representation and Reasoning pp. 114-121, 2005.


We introduce and motivate an approach for content-based information retrieval. We consider the corpus formed by real-world web pages as a dynamic and descriptive sample of natural language upon which measures of relevance can be built on. This setting is quite typical in information retrieval, although seldom in relation with the Web. However, we divert also from the information retrieval tradition by not trying to model syntax or semantics. Instead, we rely on the pragmatical dimension of language. The central characteristic of our approach is that it is lossless. Instead of building elaborate and often brittle abstractions based on the data, we let the user reflect her conception of semantics to the corpus in an efficient and flexible manner. We conclude with a few examples from our full-blown search engine which implements the ideas presented in this paper in practice.

PDF - Requires Adobe Acrobat Reader or other PDF viewer.
EPrint Type:Article
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Information Retrieval & Textual Information Access
ID Code:1794
Deposited By:Ville Tuulos
Deposited On:28 November 2005