PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Multi-spectral biclustering for data described by multiple similarities.
farida zehraoui and Florence d'Alché-Buc
(2008) Technical Report. IBISC, France.

Abstract

In computational biology as well as in information retrieval, objects of interest such as proteins, genes or documents can be described from various points of view as sequences, trees, nodes in a graph, vectors... Often only similarity matrices are available to represent each of these heterogeneous views. Existing data mining approaches, which deal with heterogeneous data, aim to extract objects that are similar among all the views. As the number of datasets increases, it is often the case that no subsets of objects are similar simultaneously following all the views, except in trivial cases. In this framework, we develop an extension of biclustering, called multi-spectral biclustering, that allows to find subgroups of objects that are similar to each other according some of the views. The new algorithm is based on multiple low dimensional embeddings of the data using Laplacian of graphs weighted by the various similarities and a generalization of the squared residue minimization biclustering algorithm proposed by Sra et al in 2004. We also propose to select biclustering parameters using a stability criterion. Numerical results on public biological datasets (proteins, Enzymes and time series genes expression data sets) show the interest of this approach. In these datasets, the heterogeneous views of the objects are described by Gram matrices.

EPrint Type:Monograph (Technical Report)
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Learning/Statistics & Optimisation
ID Code:5232
Deposited By:farida zehraoui
Deposited On:24 March 2009