Accelerating PageRank using Partition-Centric Processing
PageRank is a fundamental link analysis algorithm and a key representative of the performance of other graph algorithms and Sparse Matrix Vector (SpMV) multiplication. Calculating PageRank on sparse graphs generates large amount of random memory accesses resulting in low cache line utilization and poor use of memory bandwidth. In this paper, we present a novel Partition-Centric Processing Methodology (PCPM) that drastically reduces the amount of communication with DRAM and achieves high memory bandwidth. Similar to the state of the art Binning with Vertex-centric Gather-Apply-Scatter (BVGAS) method, PCPM performs partition wise scatter and gather of updates with both phases enjoying full cache line utilization. However, BVGAS suffers from random memory accesses and redundant read/write of update values from nodes to their neighbors. In contrast, PCPM propagates single update from source node to all destinations in a partition, thus decreasing redundancy effectively. We make use of this characteristic to develop a novel bipartite Partition-Node Graph (PNG) data layout for PCPM, that enables streaming memory accesses, with very little generation overhead. We perform detailed analysis of PCPM and provide theoretical bounds on the amount of communication and random DRAM accesses. We experimentally evaluate our approach using 6 large graph datasets and demonstrate an average 2.7x speedup in execution time and 1.7x reduction in communication, compared to the state of the art. We also show that unlike the BVGAS implementation, PCPM is able to take advantage of intelligent node labeling that enhances locality in graphs, by further reducing the amount of communication with DRAM. Although we use PageRank as the target application in this paper, our approach can be applied to generic SpMV computation.
READ FULL TEXT