Adaptive Gradient Prediction for DNN Training

05/22/2023
by   Vahid Janfaza, et al.
0

Neural network training is inherently sequential where the layers finish the forward propagation in succession, followed by the calculation and back-propagation of gradients (based on a loss function) starting from the last layer. The sequential computations significantly slow down neural network training, especially the deeper ones. Prediction has been successfully used in many areas of computer architecture to speed up sequential processing. Therefore, we propose ADA-GP, that uses gradient prediction adaptively to speed up deep neural network (DNN) training while maintaining accuracy. ADA-GP works by incorporating a small neural network to predict gradients for different layers of a DNN model. ADA-GP uses a novel tensor reorganization to make it feasible to predict a large number of gradients. ADA-GP alternates between DNN training using backpropagated gradients and DNN training using predicted gradients. ADA-GP adaptively adjusts when and for how long gradient prediction is used to strike a balance between accuracy and performance. Last but not least, we provide a detailed hardware extension in a typical DNN accelerator to realize the speed up potential from gradient prediction. Our extensive experiments with fourteen DNN models show that ADA-GP can achieve an average speed up of 1.47x with similar or even higher accuracy than the baseline models. Moreover, it consumes, on average, 34 off-chip memory accesses compared to the baseline hardware accelerator.

READ FULL TEXT
research
09/06/2020

TaxoNN: A Light-Weight Accelerator for Deep Neural Network Training

Emerging intelligent embedded devices rely on Deep Neural Networks (DNNs...
research
09/06/2020

HLSGD Hierarchical Local SGD With Stale Gradients Featuring

While distributed training significantly speeds up the training process ...
research
05/26/2023

XGrad: Boosting Gradient-Based Optimizers With Weight Prediction

In this paper, we propose a general deep learning training framework XGr...
research
02/01/2023

Weight Prediction Boosts the Convergence of AdamW

In this paper, we introduce weight prediction into the AdamW optimizer t...
research
02/14/2018

Field-Programmable Deep Neural Network (DNN) Learning and Inference accelerator: a concept

An accelerator is a specialized integrated circuit designed to perform s...
research
10/07/2021

Shift-BNN: Highly-Efficient Probabilistic Bayesian Neural Network Training via Memory-Friendly Pattern Retrieving

Bayesian Neural Networks (BNNs) that possess a property of uncertainty e...
research
05/31/2017

Sequential Dynamic Decision Making with Deep Neural Nets on a Test-Time Budget

Deep neural network (DNN) based approaches hold significant potential fo...

Please sign up or login with your details

Forgot password? Click here to reset