Using Colors and Sketches to Count Subgraphs in a Streaming Graph

02/23/2023
by   Shirin Handjani, et al.
0

Suppose we wish to estimate #H, the number of copies of some small graph H in a large streaming graph G. There are many algorithms for this task when H is a triangle, but just a few that apply to arbitrary H. Here we focus on one such algorithm, which was introduced by Kane, Mehlhorn, Sauerwald, and Sun. The storage and update time per edge for their algorithm are both O(m^k/(#H)^2), where m is the number of edges in G, and k is the number of edges in H. Here, we propose three modifications to their algorithm that can dramatically reduce both the storage and update time. Suppose that H has no leaves and that G has maximum degree ≤ m^1/2 - α, where α > 0. Define C = min(m^2α,m^1/3). Then in our version of the algorithm, the update time per edge is O(1), and the storage is approximately reduced by a factor of C^2k-t-2, where t is the number of vertices in H; in particular, the storage is O(C^2 + m^k/(C^2k-t-2 (#H)^2)).

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset