PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Topic-Specific Scoring of Documents with Discrete PCA
Wray Buntine and Kimmo Valtonen
(2004) Working Paper. HIIT, Helsinki, Finland.


The random surfer model for scoring of documents, for instance using PageRank, works when good link structure exists for a collection. Here, we develop a topic-specific version using a topic structure developed automatically via discrete PCA methods. To evaluate the resultant method, scores are developed on the Wikipedia, the public domain encyclopedia on the web, because it has a good internal link structure, and results can be readily interpreted from the page titles.

PDF - PASCAL Members only - Requires Adobe Acrobat Reader or other PDF viewer.
EPrint Type:Monograph (Working Paper)
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Learning/Statistics & Optimisation
Information Retrieval & Textual Information Access
ID Code:543
Deposited By:Wray Buntine
Deposited On:25 December 2004