Randomized Embedding Cluster Ensembles for gene expression data analysis
Alberto Bertoni and Giorgio Valentini
In: SETIT 2007 - IEEE International Conf. on Sciences of Electronic, Technologies of Information and Telecommunications, Hammamet, Tunisia(2007).
In the framework of unsupervised pattern analysis of gene expression, the high dimensionality of the data as well as
the accuracy of clustering algorithms and the reliability of the discovered clusters are critical problems.
We propose and analyze an algorithmic scheme for unsupervised cluster ensembles, where the dimensionality reduction is obtained by means of randomized embeddings with low distortion. Multiple "base" clusterings are performed on random subspaces, approximately preserving the distances between the projected examples. In this way the accuracy of each "base" clustering is maintained, and the diversity between them is improved. By combining the multiple clusterings, we can enhance the overall accuracy and the reliability of the discovered clusters,
as shown by our experimental results with high-dimensional gene expression data.
|EPrint Type:||Conference or Workshop Item (Paper)|
|Project Keyword:||Project Keyword UNSPECIFIED|
|Subjects:||Theory & Algorithms|
|Deposited By:||Giorgio Valentini|
|Deposited On:||13 February 2008|