PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

The cluster variation method for efficient linkage analysis on extended pedigrees
Kees Albers, M.A.R. Leisink and Bert Kappen
BMC Bioinformatics, Special issue on Machine Learning in Computational Biology pp. 1-31, 2005.


Background: Computing exact multipoint lod scores for extended pedigrees rapidly becomes infeasible as the number of markers and untyped individuals increase. When markers are excluded from the computation, significant power may be lost. Therefore accurate approximate methods which take into account all markers are desirable. Methods: We present a novel method for effcient estimation of lod scores on extended pedigrees. Our approach is based on the Cluster Variation Method, which deterministically estimates likelihoods by performing exact computations on tractable subsets of variables (clusters) of a Bayesian network. First a distribution over inheritances on the marker loci is approximated with the Cluster Variation Method. Then this distribution is used to estimate the lod score for each location of the trait locus. Results: First we demonstrate that significant power may be lost if markers are ignored in the multi-point analysis. On a set of pedigrees where exact computation is possible we compare the estimates of the lod scores obtained with our method to the exact lod scores. Secondly, we compare our method to a state of the art mcmc sampler. When both methods are given equal computation time, our method is more effcient. Finally, we show that cvm scales to large problem instances. Conclusions: We conclude that the Cluster Variation Method is as accurate as mcmc and generally is more effcient. Our method is a promising alternative to approaches based on mcmc sampling.

EPrint Type:Article
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Learning/Statistics & Optimisation
Theory & Algorithms
ID Code:1865
Deposited By:Bert Kappen
Deposited On:29 November 2005