Quiver: Supporting GPUs for Low-Latency, High-Throughput GNN Serving with Workload Awareness

by   Zeyuan Tan, et al.

Systems for serving inference requests on graph neural networks (GNN) must combine low latency with high throughout, but they face irregular computation due to skew in the number of sampled graph nodes and aggregated GNN features. This makes it challenging to exploit GPUs effectively: using GPUs to sample only a few graph nodes yields lower performance than CPU-based sampling; and aggregating many features exhibits high data movement costs between GPUs and CPUs. Therefore, current GNN serving systems use CPUs for graph sampling and feature aggregation, limiting throughput. We describe Quiver, a distributed GPU-based GNN serving system with low-latency and high-throughput. Quiver's key idea is to exploit workload metrics for predicting the irregular computation of GNN requests, and governing the use of GPUs for graph sampling and feature aggregation: (1) for graph sampling, Quiver calculates the probabilistic sampled graph size, a metric that predicts the degree of parallelism in graph sampling. Quiver uses this metric to assign sampling tasks to GPUs only when the performance gains surpass CPU-based sampling; and (2) for feature aggregation, Quiver relies on the feature access probability to decide which features to partition and replicate across a distributed GPU NUMA topology. We show that Quiver achieves up to 35 times lower latency with an 8 times higher throughput compared to state-of-the-art GNN approaches (DGL and PyG).


page 1

page 2

page 3

page 4


Dorylus: Affordable, Scalable, and Accurate GNN Training with Distributed CPU Servers and Serverless Threads

A graph neural network (GNN) enables deep learning on structured graph d...

Accelerating Sampling and Aggregation Operations in GNN Frameworks with GPU Initiated Direct Storage Accesses

Graph Neural Networks (GNNs) are emerging as a powerful tool for learnin...

BGL: GPU-Efficient GNN Training by Optimizing Graph Data I/O and Preprocessing

Graph neural networks (GNNs) have extended the success of deep neural ne...

Characterizing and Understanding Distributed GNN Training on GPUs

Graph neural network (GNN) has been demonstrated to be a powerful model ...

NextDoor: GPU-Based Graph Sampling for Graph Machine Learning

Representation learning is a fundamental task in machine learning. It co...

Low Latency Edge Classification GNN for Particle Trajectory Tracking on FPGAs

In-time particle trajectory reconstruction in the Large Hadron Collider ...

Hybrid Models for Learning to Branch

A recent Graph Neural Network (GNN) approach for learning to branch has ...

Please sign up or login with your details

Forgot password? Click here to reset