Unbiased Compression Saves Communication in Distributed Optimization: When and How Much?

by   Yutong He, et al.

Communication compression is a common technique in distributed optimization that can alleviate communication overhead by transmitting compressed gradients and model parameters. However, compression can introduce information distortion, which slows down convergence and incurs more communication rounds to achieve desired solutions. Given the trade-off between lower per-round communication costs and additional rounds of communication, it is unclear whether communication compression reduces the total communication cost. This paper explores the conditions under which unbiased compression, a widely used form of compression, can reduce the total communication cost, as well as the extent to which it can do so. To this end, we present the first theoretical formulation for characterizing the total communication cost in distributed optimization with communication compression. We demonstrate that unbiased compression alone does not necessarily save the total communication cost, but this outcome can be achieved if the compressors used by all workers are further assumed independent. We establish lower bounds on the communication rounds required by algorithms using independent unbiased compressors to minimize smooth convex functions, and show that these lower bounds are tight by refining the analysis for ADIANA. Our results reveal that using independent unbiased compression can reduce the total communication cost by a factor of up to Θ(√(min{n, κ})), where n is the number of workers and κ is the condition number of the functions being minimized. These theoretical findings are supported by experimental results.


page 1

page 2

page 3

page 4


Lower Bounds and Nearly Optimal Algorithms in Distributed Learning with Communication Compression

Recent advances in distributed optimization and learning have shown that...

Lower Bounds and Accelerated Algorithms in Distributed Stochastic Optimization with Communication Compression

Communication compression is an essential strategy for alleviating commu...

Communication trade-offs for synchronized distributed SGD with large step size

Synchronous mini-batch SGD is state-of-the-art for large-scale distribut...

A Distributed Frank-Wolfe Algorithm for Communication-Efficient Sparse Learning

Learning sparse combinations is a frequent theme in machine learning. In...

Uncertainty Principle for Communication Compression in Distributed and Federated Learning and the Search for an Optimal Compressor

In order to mitigate the high communication cost in distributed and fede...

On the Discrepancy between the Theoretical Analysis and Practical Implementations of Compressed Communication for Distributed Deep Learning

Compressed communication, in the form of sparsification or quantization ...

Permutation Compressors for Provably Faster Distributed Nonconvex Optimization

We study the MARINA method of Gorbunov et al (2021) – the current state-...

Please sign up or login with your details

Forgot password? Click here to reset