Hypothesis Testing of One-Sample Mean Vector in Distributed Frameworks

10/06/2021
by   Bin Du, et al.
0

Distributed frameworks are widely used to handle massive data, where sample size n is very large, and data are often stored in k different machines. For a random vector X∈ℝ^p with expectation μ, testing the mean vector H_0: μ=μ_0 vs H_1: μμ_0 for a given vector μ_0 is a basic problem in statistics. The centralized test statistics require heavy communication costs, which can be a burden when p or k is large. To reduce the communication cost, distributed test statistics are proposed in this paper for this problem based on the divide and conquer technique, a commonly used approach for distributed statistical inference. Specifically, we extend two commonly used centralized test statistics to the distributed ones, that apply to low and high dimensional cases, respectively. Comparing the power of centralized test statistics and the distributed ones, it is observed that there is a fundamental tradeoff between communication costs and the powers of the tests. This is quite different from the application of the divide and conquer technique in many other problems such as estimation, where the associated distributed statistics can be as good as the centralized ones. Numerical results confirm the theoretical findings.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset