Measuring Association on Topological Spaces Using Kernels and Geometric Graphs
In this paper we propose and study a class of simple, nonparametric, yet interpretable measures of association between two random variables X and Y taking values in general topological spaces. These nonparametric measures – defined using the theory of reproducing kernel Hilbert spaces – capture the strength of dependence between X and Y and have the property that they are 0 if and only if the variables are independent and 1 if and only if one variable is a measurable function of the other. Further, these population measures can be consistently estimated using the general framework of graph functionals which include k-nearest neighbor graphs and minimum spanning trees. Moreover, a sub-class of these estimators are also shown to adapt to the intrinsic dimensionality of the underlying distribution. Some of these empirical measures can also be computed in near linear time. Under the hypothesis of independence between X and Y, these empirical measures (properly normalized) have a standard normal limiting distribution. Thus, these measures can also be readily used to test the hypothesis of mutual independence between X and Y. In fact, as far as we are aware, these are the only procedures that possess all the above mentioned desirable properties. Furthermore, when restricting to Euclidean spaces, we can make these sample measures of association finite-sample distribution-free, under the hypothesis of independence, by using multivariate ranks defined via the theory of optimal transport. The recent correlation coefficient proposed in Dette et al. (2013), Chatterjee (2019), and Azadkia and Chatterjee (2019) can be seen as a special case of this general class of measures.
READ FULL TEXT