Communication Efficient Decentralized Training with Multiple Local Updates

10/21/2019
by   Xiang Li, et al.
0

Communication efficiency plays a significant role in decentralized optimization, especially when the data is highly non-identically distributed. In this paper, we propose a novel algorithm that we call Periodic Decentralized SGD (PD-SGD), to reduce the communication cost in a decentralized heterogeneous network. PD-SGD alternates between multiple local updates and multiple decentralized communications, making communication more flexible and controllable. We theoretically prove PD-SGD convergence at speed O(1/√(nT)) under the setting of stochastic non-convex optimization and non-i.i.d. data where n is the number of worker nodes. We also propose a novel decay strategy which periodically shrinks the length of local updates. PD-SGD equipped with this strategy can better balance the communication-convergence trade-off both theoretically and empirically.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset
Success!
Error Icon An error occurred

Sign in with Google

×

Use your Google Account to sign in to DeepAI

×

Consider DeepAI Pro