Toward Understanding the Impact of Staleness in Distributed Machine Learning

10/08/2018
by   Wei Dai, et al.
2

Many distributed machine learning (ML) systems adopt the non-synchronous execution in order to alleviate the network communication bottleneck, resulting in stale parameters that do not reflect the latest updates. Despite much development in large-scale ML, the effects of staleness on learning are inconclusive as it is challenging to directly monitor or control staleness in complex distributed environments. In this work, we study the convergence behaviors of a wide array of ML models and algorithms under delayed updates. Our extensive experiments reveal the rich diversity of the effects of staleness on the convergence of ML algorithms and offer insights into seemingly contradictory reports in the literature. The empirical findings also inspire a new convergence analysis of stochastic gradient descent in non-convex optimization under staleness, matching the best-known convergence rate of O(1/√(T)).

READ FULL TEXT

page 14

page 15

page 16

research
05/30/2019

On the Convergence of Memory-Based Distributed SGD

Distributed stochastic gradient descent (DSGD) has been widely used for ...
research
09/29/2017

Convergence Analysis of Distributed Stochastic Gradient Descent with Shuffling

When using stochastic gradient descent to solve large-scale machine lear...
research
09/09/2019

Communication-Censored Distributed Stochastic Gradient Descent

This paper develops a communication-efficient algorithm to solve the sto...
research
03/09/2021

Proof-of-Learning: Definitions and Practice

Training machine learning (ML) models typically involves expensive itera...
research
12/19/2013

Structure-Aware Dynamic Scheduler for Parallel Machine Learning

Training large machine learning (ML) models with many variables or param...
research
12/08/2014

MLitB: Machine Learning in the Browser

With few exceptions, the field of Machine Learning (ML) research has lar...
research
04/22/2022

Reward Reports for Reinforcement Learning

The desire to build good systems in the face of complex societal effects...

Please sign up or login with your details

Forgot password? Click here to reset