PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Structured-Content Extraction from the Web for Bibliographic Reference Generation
Ramon Xuriguera and Marta Arias
In: Recherche et Fouille d’Information sur le Web, 25-28 Jan 2011, Brest, France.


In this paper we present a system that automatically creates bibli- ographic indexes from a collection of PDF files by using the file contents to search the Web and later extract the information from the resulting pages. We pay special attention to the techniques used for extracting this data as well as the automatic generation of extraction rules and their evaluation.

PDF - Requires Adobe Acrobat Reader or other PDF viewer.
EPrint Type:Conference or Workshop Item (Paper)
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Information Retrieval & Textual Information Access
ID Code:8025
Deposited By:Marta Arias
Deposited On:17 March 2011