Massively Parallel Algorithms for High-Dimensional Euclidean Minimum Spanning Tree
We study the classic Euclidean Minimum Spanning Tree (MST) problem in the Massively Parallel Computation (MPC) model. Given a set X ⊂ℝ^d of n points, the goal is to produce a spanning tree for X with weight within a small factor of optimal. Euclidean MST is one of the most fundamental hierarchical geometric clustering algorithms, and with the proliferation of enormous high-dimensional data sets, such as massive transformer-based embeddings, there is now a critical demand for efficient distributed algorithms to cluster such data sets. In low-dimensional space, where d = O(1), Andoni, Nikolov, Onak, and Yaroslavtsev [STOC '14] gave a constant round MPC algorithm that obtains a high accuracy (1+ϵ)-approximate solution. However, the situation is much more challenging for high-dimensional spaces: the best-known algorithm to obtain a constant approximation requires O(log n) rounds. Recently Chen, Jayaram, Levi, and Waingarten [STOC '22] gave a Õ(log n) approximation algorithm in a constant number of rounds based on embeddings into tree metrics. However, to date, no known algorithm achieves both a constant number of rounds and approximation. In this paper, we make strong progress on this front by giving a constant factor approximation in Õ(loglog n) rounds of the MPC model. In contrast to tree-embedding-based approaches, which necessarily must pay Ω(log n)-distortion, our algorithm is based on a new combination of graph-based distributed MST algorithms and geometric space partitions. Additionally, although the approximate MST we return can have a large depth, we show that it can be modified to obtain a Õ(loglog n)-round constant factor approximation to the Euclidean Traveling Salesman Problem (TSP) in the MPC model. Previously, only a O(log n) round was known for the problem.
READ FULL TEXT