|
Topic-Specific Scoring of Documents with Discrete PCA AbstractThe random surfer model for scoring of documents, for instance using PageRank, works when good link structure exists for a collection. Here, we develop a topic-specific version using a topic structure developed automatically via discrete PCA methods. To evaluate the resultant method, scores are developed on the Wikipedia, the public domain encyclopedia on the web, because it has a good internal link structure, and results can be readily interpreted from the page titles.
[Edit] |