DISCO: Distributed Inference with Sparse Communications

by   Minghai Qin, et al.

Deep neural networks (DNNs) have great potential to solve many real-world problems, but they usually require an extensive amount of computation and memory. It is of great difficulty to deploy a large DNN model to a single resource-limited device with small memory capacity. Distributed computing is a common approach to reduce single-node memory consumption and to accelerate the inference of DNN models. In this paper, we explore the "within-layer model parallelism", which distributes the inference of each layer into multiple nodes. In this way, the memory requirement can be distributed to many nodes, making it possible to use several edge devices to infer a large DNN model. Due to the dependency within each layer, data communications between nodes during this parallel inference can be a bottleneck when the communication bandwidth is limited. We propose a framework to train DNN models for Distributed Inference with Sparse Communications (DISCO). We convert the problem of selecting which subset of data to transmit between nodes into a model optimization problem, and derive models with both computation and communication reduction when each layer is inferred on multiple nodes. We show the benefit of the DISCO framework on a variety of CV tasks such as image classification, object detection, semantic segmentation, and image super resolution. The corresponding models include important DNN building blocks such as convolutions and transformers. For example, each layer of a ResNet-50 model can be distributively inferred across two nodes with five times less data communications, almost half overall computations and half memory requirement for a single node, and achieve comparable accuracy to the original ResNet-50 model. This also results in 4.7 times overall inference speedup.


page 2

page 4

page 6

page 7


DEFER: Distributed Edge Inference for Deep Neural Networks

Modern machine learning tools such as deep neural networks (DNNs) are pl...

ScissionLite: Accelerating Distributed Deep Neural Networks Using Transfer Layer

Industrial Internet of Things (IIoT) applications can benefit from lever...

Guardians of the Deep Fog: Failure-Resilient DNN Inference from Edge to Cloud

Partitioning and distributing deep neural networks (DNNs) over physical ...

Compact Multi-level Sparse Neural Networks with Input Independent Dynamic Rerouting

Deep neural networks (DNNs) have shown to provide superb performance in ...

Restructuring, Pruning, and Adjustment of Deep Models for Parallel Distributed Inference

Using multiple nodes and parallel computing algorithms has become a prin...

A Framework for Routing DNN Inference Jobs over Distributed Computing Networks

Ubiquitous artificial intelligence (AI) is considered one of the key ser...

An OpenCL 3D FFT for Molecular Dynamics Distributed Across Multiple FPGAs

3D FFTs are used to accelerate MD electrostatic forces computations but ...

Please sign up or login with your details

Forgot password? Click here to reset