PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Two-locus association mapping in subquadratic runtime
Panagiotis Achlioptas, Bernhard Schölkopf and Karsten Borgwardt
Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining pp. 726-734, 2011.

Abstract

Genome-wide association studies (GWAS) have not been able to discover strong associations between many complex human diseases and single genetic loci. Mapping these phenotypes to pairs of genetic loci is hindered by the huge number of candidates leading to enormous computational and statistical problems. In GWAS on single nucleotide polymorphisms (SNPs), one has to consider in the order of 1010 to 1014 pairs, which is infeasible in practice. In this article, we give the first algorithm for 2-locus genome-wide association studies that is subquadratic in the number, n, of SNPs. The running time of our algorithm is data-dependent, but large experiments over real genomic data suggest that it scales empirically as n3/2. As a result, our algorithm can easily cope with n ~ 107, i.e., it can efficiently search all pairs of SNPs in the human genome.

EPrint Type:Article
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Learning/Statistics & Optimisation
ID Code:8843
Deposited By:Karsten Borgwardt
Deposited On:21 February 2012