Structured-Content Extraction from the Web for Bibliographic Reference Generation
Ramon Xuriguera and Marta Arias
In: Recherche et Fouille d’Information sur le Web, 25-28 Jan 2011, Brest, France.
In this paper we present a system that automatically creates bibli- ographic indexes from a collection of PDF files by using the file contents to search the Web and later extract the information from the resulting pages. We pay special attention to the techniques used for extracting this data as well as the automatic generation of extraction rules and their evaluation.