Model order selection for clustered bio-molecular data
Alberto Bertoni and Giorgio Valentini
In: Probabilistic Modeling and Machine Learning in Structural and Systems Biology, 17-18 June 2006, Tuusula, Finland.
In this paper we propose an improvement of the Ben-Hur algorithm
to assess the significance level of the solutions, by introducing a quantitative approach and a statistical
test based on the distribution of suitable similarity measures between pairs of clustered projected data.
Moreover we propose also a new way to perturb the data, based on random projections into lower dimensional subspaces,
that seems to be well-suited to the characteristics (high-dimensionality, redundancy, noise) of genomic and proteomic data.