Streaming Frequent Items with Timestamps and Detecting Large Neighborhoods in Graph Streams

11/20/2019
by   Christian Konrad, et al.
0

Detecting frequent items is a fundamental problem in data streaming research. However, in many applications, besides the frequent items themselves, meta data such as the timestamps of when the frequent items appeared or other application-specific data that "arrives" with the frequent items needs to be reported too. To this end, we introduce the Neighborhood Detection problem in graph streams, which both accurately models situations such as those stated above, and addresses the fundamental problem of detecting large neighborhoods or stars in graph streams. In Neighborhood Detection, an algorithm receives the edges of a bipartite graph G=(A, B, E) with |A| = n and |B| = poly n in arbitrary order and is given a threshold parameter d. Provided that there is at least one A-node of degree at least d, the objective is to output a node a ∈ A together with at least d/c of its neighbors, where c is the approximation factor. We show that in insertion-only streams, there is a one-pass Õ(n + n^1/cd) space c-approximation streaming algorithm, for integral values of c > 2. We complement this result with a lower bound, showing that computing a (c/1.01)-approximation requires space Ω(n / c^2 + n^1/c-1d / c^2), for any integral c > 2, which renders our algorithm optimal for a large range of settings (up to logarithmic factors). In insertion-deletion (turnstile) streams, we give a one-pass c-approximation algorithm with space Õ(dn/c^2) (if c <√(n)). We also prove that this is best possible up to logarithmic factors. Our lower bounds are obtained by defining new multi-party and two-party communication problems, respectively, and proving lower bounds on their communication complexities using information theoretic arguments.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset