Distributed Data Summarization in Well-Connected Networks

08/01/2019
by   Hsin-Hao Su, et al.
0

We study distributed algorithms for some fundamental problems in data summarization. Given a communication graph G of n nodes each of which may hold a value initially, we focus on computing ∑_i=1^N g(f_i), where f_i is the number of occurrences of value i and g is some fixed function. This includes important statistics such as the number of distinct elements, frequency moments, and the empirical entropy of the data. In the CONGEST model, a simple adaptation from streaming lower bounds shows that it requires Ω̃(D+ n) rounds, where D is the diameter of the graph, to compute some of these statistics exactly. However, these lower bounds do not hold for graphs that are well-connected. We give an algorithm that computes ∑_i=1^N g(f_i) exactly in τ_G · 2^O(√( n)) rounds where τ_G is the mixing time of G. This also has applications in computing the top k most frequent elements. We demonstrate that there is a high similarity between the GOSSIP model and the CONGEST model in well-connected graphs. In particular, we show that each round of the GOSSIP model can be simulated almost-perfectly in Õ(τ_G rounds of the CONGEST model. To this end, we develop a new algorithm for the GOSSIP model that 1±ϵ approximates the p-th frequency moment F_p = ∑_i=1^N f_i^p in Õ(ϵ^-2 n^1-k/p) rounds, for p >2, when the number of distinct elements F_0 is at most O(n^1/(k-1)). This result can be translated back to the CONGEST model with a factor Õ(τ_G) blow-up in the number of rounds.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset
Success!
Error Icon An error occurred

Sign in with Google

×

Use your Google Account to sign in to DeepAI

×

Consider DeepAI Pro