GradiVeQ: Vector Quantization for Bandwidth-Efficient Gradient Aggregation in Distributed CNN Training

11/08/2018
by   Mingchao Yu, et al.
0

Data parallelism can boost the training speed of convolutional neural networks (CNN), but could suffer from significant communication costs caused by gradient aggregation. To alleviate this problem, several scalar quantization techniques have been developed to compress the gradients. But these techniques could perform poorly when used together with decentralized aggregation protocols like ring all-reduce (RAR), mainly due to their inability to directly aggregate compressed gradients. In this paper, we empirically demonstrate the strong linear correlations between CNN gradients, and propose a gradient vector quantization technique, named GradiVeQ, to exploit these correlations through principal component analysis (PCA) for substantial gradient dimension reduction. GradiVeQ enables direct aggregation of compressed gradients, hence allows us to build a distributed learning system that parallelizes GradiVeQ gradient compression and RAR communications. Extensive experiments on popular CNNs demonstrate that applying GradiVeQ slashes the wall-clock gradient aggregation time of the original RAR by more than 5X without noticeable accuracy loss, and reduces the end-to-end training time by almost 50 results also show that GradiVeQ is compatible with scalar quantization techniques such as QSGD (Quantized SGD), and achieves a much higher speed-up gain under the same compression ratio.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/26/2021

Quantization for Distributed Optimization

Massive amounts of data have led to the training of large-scale machine ...
research
05/27/2019

Natural Compression for Distributed Deep Learning

Due to their hunger for big data, modern deep learning models are traine...
research
05/27/2019

Communication-Efficient Distributed Blockwise Momentum SGD with Error-Feedback

Communication overhead is a major bottleneck hampering the scalability o...
research
05/31/2019

PowerSGD: Practical Low-Rank Gradient Compression for Distributed Optimization

We study gradient compression methods to alleviate the communication bot...
research
03/04/2023

MetaGrad: Adaptive Gradient Quantization with Hypernetworks

A popular track of network compression approach is Quantization aware Tr...
research
10/30/2016

Accurate Deep Representation Quantization with Gradient Snapping Layer for Similarity Search

Recent advance of large scale similarity search involves using deeply le...
research
01/06/2019

Bandwidth Reduction using Importance Weighted Pruning on Ring AllReduce

It is inevitable to train large deep learning models on a large-scale cl...

Please sign up or login with your details

Forgot password? Click here to reset