Using Discrete PCA on Web Pages
Wray Buntine, Sami Perttu and Ville Tuulos
In: Statistical Approaches to Web Mining, 2004, 20-20 September 2004, Pisa, Italy.
Discrete PCA builds components for discrete data rather
like PCA and ICA does for real data.
The method has a long history and is most commonly
used in genetics. Recent insights into the method
are described here, and some examples
of given of its use in automatically building a topic model
for a document collection, and in its use
as a tool for relevance estimation in search.
The topic model can also be subsequently used in search.
This discussion paper describes our ongoing research here.