PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Randomized Embedding Cluster Ensembles for gene expression data analysis
Alberto Bertoni and Giorgio Valentini
In: SETIT 2007 - IEEE International Conf. on Sciences of Electronic, Technologies of Information and Telecommunications, Hammamet, Tunisia(2007).


In the framework of unsupervised pattern analysis of gene expression, the high dimensionality of the data as well as the accuracy of clustering algorithms and the reliability of the discovered clusters are critical problems. We propose and analyze an algorithmic scheme for unsupervised cluster ensembles, where the dimensionality reduction is obtained by means of randomized embeddings with low distortion. Multiple "base" clusterings are performed on random subspaces, approximately preserving the distances between the projected examples. In this way the accuracy of each "base" clustering is maintained, and the diversity between them is improved. By combining the multiple clusterings, we can enhance the overall accuracy and the reliability of the discovered clusters, as shown by our experimental results with high-dimensional gene expression data.

PDF - Requires Adobe Acrobat Reader or other PDF viewer.
EPrint Type:Conference or Workshop Item (Paper)
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Theory & Algorithms
ID Code:3621
Deposited By:Giorgio Valentini
Deposited On:13 February 2008