Using Colors and Sketches to Count Subgraphs in a Streaming Graph
Suppose we wish to estimate #H, the number of copies of some small graph H in a large streaming graph G. There are many algorithms for this task when H is a triangle, but just a few that apply to arbitrary H. Here we focus on one such algorithm, which was introduced by Kane, Mehlhorn, Sauerwald, and Sun. The storage and update time per edge for their algorithm are both O(m^k/(#H)^2), where m is the number of edges in G, and k is the number of edges in H. Here, we propose three modifications to their algorithm that can dramatically reduce both the storage and update time. Suppose that H has no leaves and that G has maximum degree ≤ m^1/2 - α, where α > 0. Define C = min(m^2α,m^1/3). Then in our version of the algorithm, the update time per edge is O(1), and the storage is approximately reduced by a factor of C^2k-t-2, where t is the number of vertices in H; in particular, the storage is O(C^2 + m^k/(C^2k-t-2 (#H)^2)).
READ FULL TEXT