Libra: In-network Gradient Aggregation for Speeding up Distributed Sparse Deep Training

by   Heng Pan, et al.

Distributed sparse deep learning has been widely used in many internet-scale applications. Network communication is one of the major hurdles for the training performance. In-network gradient aggregation on programmable switches is a promising solution to speed up the performance. Nevertheless,existing in-network aggregation solutions are designed for the distributed dense deep training, and fall short when used for the sparse deep training.To address this gap, we present Libra based on our key observation of the extremely biased update frequency of parameters in distributed deep sparse training. Specifically, Libra offloads only the aggregation for "hot" parameters that are updated frequently onto programmable switches. To enable this offloading and achieve high aggregation throughput, we propose solutions to address the challenges related to hot parameter identification, parameter orchestration, floating-point summation on switches as well as system reliability. We implemented Libra on Intel Tofino switches and integrated it with PS-lite. Finally, we evaluate Libra's performance through extensive experiments and show that Libra can speed up the gradient aggregation by 1.5 4 times.


page 2

page 3

page 5

page 6

page 7

page 8

page 13

page 14


Unlocking the Power of Inline Floating-Point Operations on Programmable Switches

The advent of switches with programmable dataplanes has enabled the rapi...

S2 Reducer: High-Performance Sparse Communication to Accelerate Distributed Deep Learning

Distributed stochastic gradient descent (SGD) approach has been widely u...

Flare: Flexible In-Network Allreduce

The allreduce operation is one of the most commonly used communication r...

SwitchAgg:A Further Step Towards In-Network Computation

Many distributed applications adopt a partition/aggregation pattern to a...

Near-Optimal Sparse Allreduce for Distributed Deep Learning

Communication overhead is one of the major obstacles to train large deep...

Simplifying Distributed Neural Network Training on Massive Graphs: Randomized Partitions Improve Model Aggregation

Distributed training of GNNs enables learning on massive graphs (e.g., s...

Please sign up or login with your details

Forgot password? Click here to reset