Bayesian Clustering and Model Exploration
Jim Smith, Paul E. Anderson, Kieron D. Edwards and Andrew J. Millar
In: PASCAL Workshop on Clustering, 4-5 July 2005, London, UK.
The arrival of longitudinal microarray data in biology demands the development of new types of clustering algorithms. Clustering is required over tens of thousands of time series (gene expression profiles) with perhaps only ten time points. Further, the experiments are designed to determine which denes exhibit a particular type of qualitative structure; we shall focus on circadian genes. An alternative to clustering over points in Euclidean space is thus needed. We modify a recent Bayesian clustering algorithm to address these issues. This adaptation employs the posterior distributions of the parameters in the Bayesian models. These were originally used to score cluster partitions. We propose their utility in categorising interesting clusters and then enlist this classification is a more effective and efficient search of the vast space of possible partitions. These methods are applicable to the clustering of any time series data.