PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Hierarchical Indexing and Documents Matching in BoW
Maayan Geffet and Dror G. Feitelson
JCDL, June 2001, Roanoke, VA 2000.

Abstract

BoW is an on-line bibliographical repository based on a hierarchical c oncept index to which entries are linked. Searching in the repository should therefore return matching topics from the hierarchy, rather than just a list of entries. Likewise, when new entries are inserted, a search for relevant topics to which they should be linked is required. We develop a vector-based algorithm that creates keyword vectors for the set of competing topics at each node in the hierarchy, and show how its performance improves when domain-specific features are added (such as special handling of topic titles and author names). The results of a 7-fold cross validation on a corpus of some 3,500 entries with a 5-level index are hit ratios in the range of 89-95%, and most of the misclassifications are indeed ambiguous to begin with.

Postscript - Requires a viewer, such as GhostView
??
EPrint Type:Article
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Information Retrieval & Textual Information Access
ID Code:382
Deposited By:Maayan Geffet
Deposited On:18 December 2004