|
Language Pragmatics, Contexts and a Search Engine AbstractWe introduce and motivate an approach for content-based information retrieval. We consider the corpus formed by real-world web pages as a dynamic and descriptive sample of natural language upon which measures of relevance can be built on. This setting is quite typical in information retrieval, although seldom in relation with the Web. However, we divert also from the information retrieval tradition by not trying to model syntax or semantics. Instead, we rely on the pragmatical dimension of language. The central characteristic of our approach is that it is lossless. Instead of building elaborate and often brittle abstractions based on the data, we let the user reflect her conception of semantics to the corpus in an efficient and flexible manner. We conclude with a few examples from our full-blown search engine which implements the ideas presented in this paper in practice.
[Edit] |