PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

The Google Similarity Distance
Rudi Cilibrasi and Paul Vitányi
IEEE Transactions on Knowledge and Data Engineering 2007.


We present a new theory of similarity between words and phrases based on information distance and Kolmogorov complexity. To fix thoughts we use the world wide web as a data base, and Google as a search engine. The method is then appplied to automatically extract similarity, the Google similarity distance, of words and phrases from the Google web page counts.

PDF - Requires Adobe Acrobat Reader or other PDF viewer.
EPrint Type:Article
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Learning/Statistics & Optimisation
Information Retrieval & Textual Information Access
ID Code:2784
Deposited By:Paul Vitányi
Deposited On:22 November 2006