PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

An unsupervised fuzzy ensemble algorithmic scheme for gene expression data analysis
Roberto Avogadri and Giorgio Valentini
In: NETTAB 2007 workshop on a Semantic Web for Bioinformatics, Pisa, Italy(2007).


Background: In recent years unsupervised ensemble clustering methods have been successfully applied to DNA microarray data analysis to improve the accuracy and the reliability of clustering results. Nevertheless, a major problem is represented by the fact that classes of functionally correlated examples (e.g. subclasses of diseases characterized at bio-molecular level) are not in general clearly separable, and in many cases the same gene may belong to different functional classes (e.g. may participate to different biological processes). Results: We propose an ensemble clustering algorithm scheme, based on a fuzzy approach, that directly permit to deal with overlapping classes or with genes or samples that may belong to more clusters at the same time. From our algorithmic scheme several fuzzy ensemble clustering algorithms may be derived, according to the way the multiple clusterings are combined and the consensus clustering is generated. We test some of the proposed ensemble algorithms with two DNA microarray data sets available on the web, comparing the results with other single and ensemble clustering methods. Conclusions: Our proposed fuzzy ensemble approach may be applied to discover classes of co-expressed genes or subclasses of functionally related examples, and in principle it may be applied for the unsupervised analysis of different types of complex bio-molecular data. Fuzzy ensemble algorithms can assign each gene/sample to multiple classes and can estimate and improve the accuracy and the reliability of the discovered clusterings, as shown by our experimental results.

PDF - Requires Adobe Acrobat Reader or other PDF viewer.
EPrint Type:Conference or Workshop Item (Paper)
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Theory & Algorithms
ID Code:3583
Deposited By:Giorgio Valentini
Deposited On:13 February 2008