DeepAI AI Chat
Log In Sign Up

A Robust Classification Framework for Byzantine-Resilient Stochastic Gradient Descent

by   Shashank Reddy Chirra, et al.
IIIT Bangalore

This paper proposes a Robust Gradient Classification Framework (RGCF) for Byzantine fault tolerance in distributed stochastic gradient descent. The framework consists of a pattern recognition filter which we train to be able to classify individual gradients as Byzantine by using their direction alone. This filter is robust to an arbitrary number of Byzantine workers for convex as well as non-convex optimisation settings, which is a significant improvement on the prior work that is robust to Byzantine faults only when up to 50 workers are Byzantine. This solution does not require an estimate of the number of Byzantine workers; its running time is not dependent on the number of workers and can scale up to training instances with a large number of workers without a loss in performance. We validate our solution by training convolutional neural networks on the MNIST dataset in the presence of Byzantine workers.


page 1

page 2

page 3

page 4


Zeno++: robust asynchronous SGD with arbitrary number of Byzantine workers

We propose Zeno++, a new robust asynchronous synchronous Stochastic Grad...

Befriending The Byzantines Through Reputation Scores

We propose two novel stochastic gradient descent algorithms, ByGARS and ...

Byzantine-Robust Loopless Stochastic Variance-Reduced Gradient

Distributed optimization with open collaboration is a popular field sinc...

Byzantine-Robust Learning on Heterogeneous Datasets via Resampling

In Byzantine robust distributed optimization, a central server wants to ...

Securing Distributed Machine Learning in High Dimensions

We consider securing a distributed machine learning system wherein the d...

Byzantine-Robust Decentralized Learning via Self-Centered Clipping

In this paper, we study the challenging task of Byzantine-robust decentr...

Verifiable Coded Computing: Towards Fast, Secure and Private Distributed Machine Learning

Stragglers, Byzantine workers, and data privacy are the main bottlenecks...