Generalized kernel distance covariance in high dimensions: non-null CLTs and power universality
Distance covariance is a popular dependence measure for two random vectors X and Y of possibly different dimensions and types. Recent years have witnessed concentrated efforts in the literature to understand the distributional properties of the sample distance covariance in a high-dimensional setting, with an exclusive emphasis on the null case that X and Y are independent. This paper derives the first non-null central limit theorem for the sample distance covariance, and the more general sample (Hilbert-Schmidt) kernel distance covariance in high dimensions, primarily in the Gaussian case. The new non-null central limit theorem yields an asymptotically exact first-order power formula for the widely used generalized kernel distance correlation test of independence between X and Y. The power formula in particular unveils an interesting universality phenomenon: the power of the generalized kernel distance correlation test is completely determined by n·dcor^2(X,Y)/√(2) in the high dimensional limit, regardless of a wide range of choices of the kernels and bandwidth parameters. Furthermore, this separation rate is also shown to be optimal in a minimax sense. The key step in the proof of the non-null central limit theorem is a precise expansion of the mean and variance of the sample distance covariance in high dimensions, which shows, among other things, that the non-null Gaussian approximation of the sample distance covariance involves a rather subtle interplay between the dimension-to-sample ratio and the dependence between X and Y.
READ FULL TEXT