Network-Density-Controlled Decentralized Parallel Stochastic Gradient Descent in Wireless Systems
This paper proposes a communication strategy for decentralized learning on wireless systems. Our discussion is based on the decentralized parallel stochastic gradient descent (D-PSGD), which is one of the state-of-the-art algorithms for decentralized learning. The main contribution of this paper is to raise a novel open question for decentralized learning on wireless systems: there is a possibility that the density of a network topology significantly influences the runtime performance of D-PSGD. In general, it is difficult to guarantee delay-free communications without any communication deterioration in real wireless network systems because of path loss and multi-path fading. These factors significantly degrade the runtime performance of D-PSGD. To alleviate such problems, we first analyze the runtime performance of D-PSGD by considering real wireless systems. This analysis yields the key insights that dense network topology (1) does not significantly gain the training accuracy of D-PSGD compared to sparse one, and (2) strongly degrades the runtime performance because this setting generally requires to utilize a low-rate transmission. Based on these findings, we propose a novel communication strategy, in which each node estimates optimal transmission rates such that communication time during the D-PSGD optimization is minimized under the constraint of network density, which is characterized by radio propagation property. The proposed strategy enables to improve the runtime performance of D-PSGD in wireless systems. Numerical simulations reveal that the proposed strategy is capable of enhancing the runtime performance of D-PSGD.
READ FULL TEXT