Hybrid Edge Partitioner: Partitioning Large Power-Law Graphs under Memory Constraints

by   Ruben Mayer, et al.

Distributed systems that manage and process graph-structured data internally solve a graph partitioning problem to minimize their communication overhead and query run-time. Besides computational complexity – optimal graph partitioning is NP-hard – another important consideration is the memory overhead. Real-world graphs often have an immense size, such that loading the complete graph into memory for partitioning is not economical or feasible. Currently, the common approach to reduce memory overhead is to rely on streaming partitioning algorithms. While the latest streaming algorithms lead to reasonable partitioning quality on some graphs, they are still not completely competitive to in-memory partitioners. In this paper, we propose a new system, Hybrid Edge Partitioner (HEP), that can partition graphs that fit partly into memory while yielding a high partitioning quality. HEP can flexibly adapt its memory overhead by separating the edge set of the graph into two sub-sets. One sub-set is partitioned by NE++, a novel, efficient in-memory algorithm, while the other sub-set is partitioned by a streaming approach. Our evaluations on large real-world graphs show that in many cases, HEP outperforms both in-memory partitioning and streaming partitioning at the same time. Hence, HEP is an attractive alternative to existing solutions that cannot fine-tune their memory overheads. Finally, we show that using HEP, we achieve a significant speedup of distributed graph processing jobs on Spark/GraphX compared to state-of-the-art partitioning algorithms.


page 9

page 11


Out-of-Core Edge Partitioning at Linear Run-Time

Graph edge partitioning is an important preprocessing step to optimize d...

Streaming Graph Challenge: Stochastic Block Partition

An important objective for analyzing real-world graphs is to achieve sca...

Graph Partitioning via Parallel Submodular Approximation to Accelerate Distributed Machine Learning

Distributed computing excels at processing large scale data, but the com...

TurboGraph++: A Scalable and Fast Graph Analytics System

Existing distributed graph analytics systems are categorized into two ma...

System-aware dynamic partitioning for batch and streaming workloads

When processing data streams with highly skewed and nonstationary key di...

ADWISE: Adaptive Window-based Streaming Edge Partitioning for High-Speed Graph Processing

In recent years, the graph partitioning problem gained importance as a m...

Streaming Hypergraph Partitioning Algorithms on Limited Memory Environments

Many well-known, real-world problems involve dynamic data which describe...

Please sign up or login with your details

Forgot password? Click here to reset