Topic-Specific Scoring of Documents with Discrete PCA
Wray Buntine and Kimmo Valtonen
HIIT, Helsinki, Finland.
The random surfer model for scoring of documents,
for instance using PageRank, works when good link structure
exists for a collection. Here, we develop a topic-specific
version using a topic structure developed automatically via
discrete PCA methods. To evaluate the resultant method,
scores are developed on the Wikipedia, the public domain
encyclopedia on the web, because it has a good internal
link structure, and results can be readily interpreted
from the page titles.