Massively Parallel Single-Source SimRanks in o(log n) Rounds

by   Siqiang Luo, et al.

SimRank is one of the most fundamental measures that evaluate the structural similarity between two nodes in a graph and has been applied in a plethora of data management tasks. These tasks often involve single-source SimRank computation that evaluates the SimRank values between a source node s and all other nodes. Due to its high computation complexity, single-source SimRank computation for large graphs is notoriously challenging, and hence recent studies resort to distributed processing. To our surprise, although SimRank has been widely adopted for two decades, theoretical aspects of distributed SimRanks with provable results have rarely been studied. In this paper, we conduct a theoretical study on single-source SimRank computation in the Massive Parallel Computation (MPC) model, which is the standard theoretical framework modeling distributed systems such as MapReduce, Hadoop, or Spark. Existing distributed SimRank algorithms enforce either Ω(log n) communication round complexity or Ω(n) machine space for a graph of n nodes. We overcome this barrier. Particularly, given a graph of n nodes, for any query node v and constant error ϵ>3/n, we show that using O(log^2 log n) rounds of communication among machines is almost enough to compute single-source SimRank values with at most ϵ absolute errors, while each machine only needs a space sub-linear to n. To the best of our knowledge, this is the first single-source SimRank algorithm in MPC that can overcome the Θ(log n) round complexity barrier with provable result accuracy.


page 1

page 2

page 3

page 4


Parallel Graph Algorithms in Constant Adaptive Rounds: Theory meets Practice

We study fundamental graph problems such as graph connectivity, minimum ...

Round Compression for Parallel Graph Algorithms in Strongly Sublinear Space

The Massive Parallel Computation (MPC) model is a theoretical framework ...

Massively Parallel Computation via Remote Memory Access

We introduce the Adaptive Massively Parallel Computation (AMPC) model, w...

Massively Parallel Algorithms for Distance Approximation and Spanners

Over the past decade, there has been increasing interest in distributed/...

Distributed Verifiers in PCP

Traditional proof systems involve a resource-bounded verifier communicat...

Walking Randomly, Massively, and Efficiently

We introduce an approach that enables for efficiently generating many in...

Adaptive Massively Parallel Algorithms for Cut Problems

We study the Weighted Min Cut problem in the Adaptive Massively Parallel...

Please sign up or login with your details

Forgot password? Click here to reset