Connected Components for Infinite Graph Streams: Theory and Practice

11/30/2021
by   Jonathan W. Berry, et al.
0

Motivated by the properties of unending real-world cybersecurity streams, we present a new graph streaming model: XStream. We maintain a streaming graph and its connected components at single-edge granularity. In cybersecurity graph applications, input streams typically consist of edge insertions; individual deletions are not explicit. Analysts maintain as much history as possible and will trigger customized bulk deletions when necessary Despite a variety of dynamic graph processing systems and some canonical literature on theoretical sliding-window graph streaming, XStream is the first model explicitly designed to accommodate this usage model. Users can provide Boolean predicates to define bulk deletions. Edge arrivals are expected to occur continuously and must always be handled. XStream is implemented via a ring of finite-memory processors. We give algorithms to maintain connected components on the input stream, answer queries about connectivity, and to perform bulk deletion. The system requires bandwidth for internal messages that is some constant factor greater than the stream arrival rate. We prove a relationship among four quantities: the proportion of query downtime allowed, the proportion of edges that survive an aging event, the proportion of duplicated edges, and the bandwidth expansion factor. In addition to presenting the theory behind XStream, we present computational results for a single-threaded prototype implementation. Stream ingestion rates are bounded by computer architecture. We determine this bound for XStream inter-process message-passing rates in Intel TBB applications on Intel Sky Lake processors: between one and five million graph edges per second. Our single-threaded prototype runs our full protocols through multiple aging events at between one half and one a million edges per second, and we give ideas for speeding this up by orders of magnitude.

READ FULL TEXT
research
03/28/2022

GraphZeppelin: Storage-Friendly Sketching for Connected Components on Dynamic Graph Streams

Finding the connected components of a graph is a fundamental problem wit...
research
09/13/2017

Approximate Integration of streaming data

We approximate analytic queries on streaming data with a weighted reserv...
research
10/27/2020

Improved Algorithms for Edge Colouring in the W-Streaming Model

In the W-streaming model, an algorithm is given O(n polylog n) space and...
research
07/22/2018

Independent Sets in Vertex-Arrival Streams

We consider the classic maximal and maximum independent set problems in ...
research
10/31/2021

On multiple IoT data streams processing using LoRaWAN

LoraWAN has turned out to be one of the most successful frameworks in Io...
research
11/07/2017

SWOOP: Top-k Similarity Joins over Set Streams

We provide efficient support for applications that aim to continuously f...
research
09/20/2017

SBG-Sketch: A Self-Balanced Sketch for Labeled-Graph Stream Summarization

Applications in various domains rely on processing graph streams, e.g., ...

Please sign up or login with your details

Forgot password? Click here to reset