DAPPLE: A Pipelined Data Parallel Approach for Training Large Models

07/02/2020
by   Shiqing Fan, et al.
1

It is a challenging task to train large DNN models on sophisticated GPU platforms with diversified interconnect capabilities. Recently, pipelined training has been proposed as an effective approach for improving device utilization. However, there are still several tricky issues to address: improving computing efficiency while ensuring convergence, and reducing memory usage without incurring additional computing costs. We propose DAPPLE, a synchronous training framework which combines data parallelism and pipeline parallelism for large DNN models. It features a novel parallelization strategy planner to solve the partition and placement problems, and explores the optimal hybrid strategy of data and pipeline parallelism. We also propose a new runtime scheduling algorithm to reduce device memory usage, which is orthogonal to re-computation approach and does not come at the expense of training throughput. Experiments show that DAPPLE planner consistently outperforms strategies generated by PipeDream's planner by up to 3.23x under synchronous training scenarios, and DAPPLE runtime outperforms GPipe by 1.6x speedup of training throughput and reduces the memory consumption of 12

READ FULL TEXT

page 1

page 4

research
10/24/2019

XPipe: Efficient Pipeline Model Parallelism for Multi-GPU DNN Training

We propose XPipe, an efficient asynchronous pipeline model parallelism a...
research
12/23/2020

BaPipe: Exploration of Balanced Pipeline Parallelism for DNN Training

The size of deep neural networks (DNNs) grows rapidly as the complexity ...
research
08/30/2023

Hanayo: Harnessing Wave-like Pipeline Parallelism for Enhanced Large Model Training Efficiency

Large-scale language models have become increasingly challenging and exp...
research
04/26/2022

Bamboo: Making Preemptible Instances Resilient for Affordable Training of Large DNNs

DNN models across many domains continue to grow in size, resulting in hi...
research
07/30/2019

Optimizing Multi-GPU Parallelization Strategies for Deep Learning Training

Deploying deep learning (DL) models across multiple compute devices to t...
research
04/22/2022

Efficient Pipeline Planning for Expedited Distributed DNN Training

To train modern large DNN models, pipeline parallelism has recently emer...
research
05/12/2017

Sprinkler: Maximizing Resource Utilization in Many-Chip Solid State Disks

Resource utilization is one of the emerging problems in many-chip SSDs. ...

Please sign up or login with your details

Forgot password? Click here to reset