Five Shades of Grey: Phase Transitions in High-dimensional Multiple Testing

10/13/2019
by   Zheng Gao, et al.
0

We are motivated by marginal screenings of categorical variables, and study high-dimensional multiple testing problems where test statistics have approximate chi-square distributions. We characterize four new phase transitions in high-dimensional chi-square models, and derive the signal sizes necessary and sufficient for statistical procedures to simultaneously control false discovery (in terms of family-wise error rate or false discovery rate) and missed detection (in terms of family-wise non-discovery rate or false non-discovery rate) in large dimensions. Remarkably, degrees of freedom in the chi-square distributions do not affect the boundaries in all four phase transitions. Several well-known procedures are shown to attain these boundaries. Two new phase transitions are also identified in the Gaussian location model under one-sided alternatives. We then elucidate on the nature of signal sizes in association tests by characterizing its relationship with marginal frequencies, odds ratio, and sample sizes in 2×2 contingency tables. This allows us to illustrate an interesting manifestation of the phase transition phenomena in genome-wide association studies (GWAS). We also show, perhaps surprisingly, that given total sample sizes, balanced designs in such association studies rarely deliver optimal power.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset