Accounting for probe-level noise in principal component analysis of microarray data
Guido Sanguinetti, Marta Milo, magnus rattray and Neil Lawrence
Principal Component Analysis (PCA) is one of
the most popular dimensionality reduction techniques for the analysis
of high dimensional datasets. However, in its standard form, it does
not take into account any error measures associated with the data
points beyond a standard spherical noise. This indiscriminate nature
provides one of its main weaknesses when applied to biological data
with inherently large variability, such as expression levels measured
with microarrays. Methods now exist for extracting credibility intervals
from the probe-level analysis of cDNA and oligonucleotide microarray
experiments. These credibility intervals are gene and experiment specific, and can be propagated through
an appropriate probabilistic downstream analysis.
We propose a new model-based approach to PCA that takes into account
the variances associated with each gene in each experiment. We
develop an efficient EM-algorithm to estimate the parameters of our new model.
The model provides significantly better results than standard PCA,
while remaining computationally
reasonable. We show how the model can be used to `denoise' a microarray
data set leading to improved expression profiles and tighter clustering
across profiles. The probabilistic nature of the model means that
the correct number of principal components is automatically obtained.