A New Framework for Distance and Kernel-based Metrics in High Dimensions

by   Shubhadeep Chakraborty, et al.

The paper presents new metrics to quantify and test for (i) the equality of distributions and (ii) the independence between two high-dimensional random vectors. We show that the energy distance based on the usual Euclidean distance cannot completely characterize the homogeneity of two high-dimensional distributions in the sense that it only detects the equality of means and the traces of covariance matrices in the high-dimensional setup. We propose a new class of metrics which inherits the desirable properties of the energy distance and maximum mean discrepancy/(generalized) distance covariance and the Hilbert-Schmidt Independence Criterion in the low-dimensional setting and is capable of detecting the homogeneity of/completely characterizing independence between the low-dimensional marginal distributions in the high dimensional setup. We further propose t-tests based on the new metrics to perform high-dimensional two-sample testing/independence testing and study their asymptotic behavior under both high dimension low sample size (HDLSS) and high dimension medium sample size (HDMSS) setups. The computational complexity of the t-tests only grows linearly with the dimension and thus is scalable to very high dimensional data. We demonstrate the superior power behavior of the proposed tests for homogeneity of distributions and independence via both simulated and real datasets.


Interpoint Distance Based Two Sample Tests in High Dimension

In this paper, we study a class of two sample test statistics based on i...

Distance-based and RKHS-based Dependence Metrics in High Dimension

In this paper, we study distance covariance, Hilbert-Schmidt covariance ...

On high-dimensional modifications of some graph-based two-sample tests

Testing for the equality of two high-dimensional distributions is a chal...

High-Dimensional Independence Testing and Maximum Marginal Correlation

A number of universally consistent dependence measures have been recentl...

MultiFIT: A Multivariate Multiscale Framework for Independence Tests

We present a framework for testing independence between two random vecto...

Double Data Piling for Heterogeneous Covariance Models

In this work, we characterize two data piling phenomenon for a high-dime...

On High Dimensional Behaviour of Some Two-Sample Tests Based on Ball Divergence

In this article, we propose some two-sample tests based on ball divergen...

Please sign up or login with your details

Forgot password? Click here to reset