Generalized Goodness-Of-Fit Tests for Correlated Data
This paper concerns the problem of applying the generalized goodness-of-fit (gGOF) type tests for analyzing correlated data. The gGOF family broadly covers the maximum-based testing procedures by ordered input p-values, such as the false discovery rate procedure, the Kolmogorov-Smirnov type statistics, the ϕ-divergence family, etc. Data analysis framework and a novel p-value calculation approach is developed under the Gaussian mean model and the generalized linear model (GLM). We reveal the influence of data transformations to the signal-to-noise ratio and the statistical power under both sparse and dense signal patterns and various correlation structures. In particular, the innovated transformation (IT), which is shown equivalent to the marginal model-fitting under the GLM, is often preferred for detecting sparse signals in correlated data. We propose a testing strategy called the digGOF, which combines a double-adaptation procedure (i.e., adapting to both the statistic's formula and the truncation scheme of the input p-values) and the IT within the gGOF family. It features efficient computation and robust adaptation to the family-retained advantages for given data. Relevant approaches are assessed by extensive simulations and by genetic studies of Crohn's disease and amyotrophic lateral sclerosis. Computations have been included into the R package SetTest available on CRAN.
READ FULL TEXT