|
Structured-Content Extraction from the Web for Bibliographic Reference Generation AbstractIn this paper we present a system that automatically creates bibli- ographic indexes from a collection of PDF files by using the file contents to search the Web and later extract the information from the resulting pages. We pay special attention to the techniques used for extracting this data as well as the automatic generation of extraction rules and their evaluation.
[Edit] |