Optimizing Graph Processing and Preprocessing with Hardware Assisted Propagation Blocking
Extensive prior research has focused on alleviating the characteristic poor cache locality of graph analytics workloads. However, graph pre-processing tasks remain relatively unexplored. In many important scenarios, graph pre-processing tasks can be as expensive as the downstream graph analytics kernel. We observe that Propagation Blocking (PB), a software optimization designed for SpMV kernels, generalizes to many graph analytics kernels as well as common pre-processing tasks. In this work, we identify the lingering inefficiencies of a PB execution on conventional multicores and propose architecture support to eliminate PB's bottlenecks, further improving the performance gains from PB. Our proposed architecture – COBRA – optimizes the PB execution of both graph processing and pre-processing alike to provide end-to-end speedups of up to 4.6x (3.5x on average).
READ FULL TEXT