Massively Parallel Algorithms and Hardness for Single-Linkage Clustering Under ℓ_p-Distances

10/04/2017
by   Grigory Yaroslavtsev, et al.
0

We present massively parallel (MPC) algorithms and hardness of approximation results for computing Single-Linkage Clustering of n input d-dimensional vectors under Hamming, ℓ_1, ℓ_2 and ℓ_∞ distances. All our algorithms run in O( n) rounds of MPC for any fixed d and achieve (1+ϵ)-approximation for all distances (except Hamming for which we show an exact algorithm). We also show constant-factor inapproximability results for o( n)-round algorithms under standard MPC hardness assumptions (for sufficiently large dimension depending on the distance used). Efficiency of implementation of our algorithms in Apache Spark is demonstrated through experiments on a variety of datasets exhibiting speedups of several orders of magnitude.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset