PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Inferring Document Similarity from Hyperlinks
David Grangier and Samy Bengio
In: Proceedings of the Conference on Information and Knowledge Management, CIKM, 2005.

Abstract

Assessing semantic similarity between text documents is a crucial aspect in Information Retrieval systems. In this work, we propose to use hyperlink information to derive a similarity measure that can then be applied to compare any text documents, with or without hyperlinks. As linked documents are generally semantically closer than unlinked documents, we use a training corpus with hyperlinks to infer a function a,b to sim(a,b) that assigns a higher value to linked documents than to unlinked ones. Two sets of experiments on different corpora show that this function compares favorably with OKAPI matching on document retrieval tasks.

PDF - Requires Adobe Acrobat Reader or other PDF viewer.
EPrint Type:Conference or Workshop Item (Poster)
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Information Retrieval & Textual Information Access
ID Code:1098
Deposited By:Samy Bengio
Deposited On:26 September 2005