On the performance of clustering in Hilbert spaces
Gerard Biau, Luc Devroye and Gábor Lugosi
IEEE Transactions on Information Theory Volume 54, pp. 781-790, 2008.

## Abstract

Based on $n$ randomly drawn vectors in a separable Hilbert space, one may construct a $k$-means clustering scheme by minimizing an empirical squared error. We investigate the risk of such a clustering scheme, defined as the expected squared distance of a random vector $X$ from the set of cluster centers. Our main result states that, for an almost surely bounded $X$, the expected excess clustering risk is $O(\sqrt{1/n})$. Since clustering in high (or even infinite)-dimensional spaces may lead to severe computational problems, we examine the properties of a dimension reduction strategy for clustering based on Johnson-Lindenstrauss-type random projections. Our results reflect a tradeoff between accuracy and computational complexity when one uses $k$-means clustering after random projection of the data to a low-dimensional space. We argue that random projections work better than other simplistic dimension reduction schemes

EPrint Type: Article Project Keyword UNSPECIFIED Computational, Information-Theoretic Learning with StatisticsLearning/Statistics & OptimisationTheory & Algorithms 3929 Gábor Lugosi 25 February 2008