PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Semantic Text Features from Small World Graphs
Jure Leskovec and John Shawe-Taylor
In: Subspace, Latent Structure and Feature Selection techniques: Statistical and Optimisation perspectives Workshop 2005, Bohinj, Slovenia(2005).


We present a set of methods for creating a semantic representation from a collection of textual documents. Given a document collection we use a simple algorithm to connect the documents into a tree or a graph. Using the imposed topology we define a feature and document similarity measures. We use the kernel alignment to compare the quality of various similarity measures. Results show that the document similarity defined over the topology gives better alignment than standard cosine similarity measure on a bag of words document representation.

PDF - Requires Adobe Acrobat Reader or other PDF viewer.
EPrint Type:Conference or Workshop Item (Paper)
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Computational, Information-Theoretic Learning with Statistics
ID Code:1222
Deposited By:Jure Leskovec
Deposited On:28 November 2005