Refined statistical inference methods for contingency table analyses in genetic association studies
Statistical inference for contingency tables is ubiquitous in genetic association analyses. Namely, depending on the hypothesized underlying genetic model, an analysis of the association of a dichotomous endpoint (like the diagnose of a disease) with a bi-allelic set of potentially predictive markers can statistically be formalized by a family of tests for association in (2 x 2) or (2 x 3) contingency tables. Although the theory of exact tests for contingency table analyses can be tracked back to Fisher (1922), it challenges researchers until today, among other things due to interesting and unexpected phenomena originating from the discreteness of the testing problem (cf., e. g., Finner and Straßburger (2001a, 2001b)). The issue becomes even more delicate if many contingency tables instead of a single one have to be considered simultaneously. In this case, multiplicity correction arises as a further difficulty. Here, we focus on a specific setting: We assume that all (successfully) genotyped markers shall simultaneously be evaluated with respect to their association with a dichotomous phenotype in a confirmatory analysis (no further independent replication study, strong control of the family-wise error rate). We present an approach combining the notion of realized randomized p-values (cf. Finner and Straßburger (2007), Finner et al. (2010)), pre-estimation of the proportion of informative markers and the concept of "effective numbers of tests" (cf. Moskvina and Schmidt (2008) and references therein) in order to provide a multiplicity-adjusted threshold for marginal p-values which is typically much larger than the corresponding Bonferroni correction.