PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Endjinn: a scalable topic-based open source search engine
Wray Buntine, Jaakko Lofström, Jukka Perkiö, Sami Perttu, Vladimir Poroshin, Tomi Silander, Henry Tirri, Antti Tuominen and Ville Tuulos
In: WI 2004, 20-24 Sep 2004, Beijing, China.


Site-based or topic-specific search engines work with mixed success because of the general difficulty of the information retrieval task, and the lack of good link information to allow authorities to be identified. We are advocating an open source approach to the problem because of its scope and need for software components. We have adopted a topic-based search engine because it represents the next generation of capability. This paper outlines our scalable system for site-based or topic-specific search, and demonstrates the developing system on a small 250,000 document collection of EU and UN web pages.

Postscript - Requires a viewer, such as GhostView
EPrint Type:Conference or Workshop Item (Paper)
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Computational, Information-Theoretic Learning with Statistics
Natural Language Processing
Information Retrieval & Textual Information Access
ID Code:146
Deposited By:Sami Perttu
Deposited On:31 May 2004