PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

A SURVEY OF FOCUSED WEB CRAWLING ALGORITHMS
Blaz Novak
In: SIKDD 2004 at multiconference IS 2004, 12-15 Oct 2004, Ljubljana, Slovenia.

Abstract

Web search engines collect data from the Web by “crawling” it – performing a simulated browsing of the web by extracting links from pages, downloading all of them and repeating the process ad infinitum. This process requires enormous amounts of hardware and network resources, ending up with a large fraction of the visible web on the crawler’s storage array. But when only information about a predefined topic set is desired, a specialization of the aforementioned process called “focused crawling” is used. What follows here is a shor t review of existing techniques for focused crawling.

PDF - Requires Adobe Acrobat Reader or other PDF viewer.
EPrint Type:Conference or Workshop Item (Paper)
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Information Retrieval & Textual Information Access
ID Code:738
Deposited By:Blaz Fortuna
Deposited On:30 December 2004