ApproxTrain: Fast Simulation of Approximate Multipliers for DNN Training and Inference

by   Jing Gong, et al.

Edge training of Deep Neural Networks (DNNs) is a desirable goal for continuous learning; however, it is hindered by the enormous computational power required by training. Hardware approximate multipliers have shown their effectiveness for gaining resource-efficiency in DNN inference accelerators; however, training with approximate multipliers is largely unexplored. To build resource efficient accelerators with approximate multipliers supporting DNN training, a thorough evaluation of training convergence and accuracy for different DNN architectures and different approximate multipliers is needed. This paper presents ApproxTrain, an open-source framework that allows fast evaluation of DNN training and inference using simulated approximate multipliers. ApproxTrain is as user-friendly as TensorFlow (TF) and requires only a high-level description of a DNN architecture along with C/C++ functional models of the approximate multiplier. We improve the speed of the simulation at the multiplier level by using a novel LUT-based approximate floating-point (FP) multiplier simulator on GPU (AMSim). ApproxTrain leverages CUDA and efficiently integrates AMSim into the TensorFlow library, in order to overcome the absence of native hardware approximate multiplier in commercial GPUs. We use ApproxTrain to evaluate the convergence and accuracy of DNN training with approximate multipliers for small and large datasets (including ImageNet) using LeNets and ResNets architectures. The evaluations demonstrate similar convergence behavior and negligible change in test accuracy compared to FP32 and bfloat16 multipliers. Compared to CPU-based approximate multiplier simulations in training and inference, the GPU-accelerated ApproxTrain is more than 2500x faster. Based on highly optimized closed-source cuDNN/cuBLAS libraries with native hardware multipliers, the original TensorFlow is only 8x faster than ApproxTrain.


ALWANN: Automatic Layer-Wise Approximation of Deep Neural Network Accelerators without Retraining

The state-of-the-art approaches employ approximate computing to improve ...

AdaPT: Fast Emulation of Approximate DNN Accelerators in PyTorch

Current state-of-the-art employs approximate multipliers to address the ...

TFApprox: Towards a Fast Emulation of DNN Approximate Hardware Accelerators on GPU

Energy efficiency of hardware accelerators of deep neural networks (DNN)...

Deep Learning Training with Simulated Approximate Multipliers

This paper presents by simulation how approximate multipliers can be uti...

PLAM: a Posit Logarithm-Approximate Multiplier for Power Efficient Posit-based DNNs

The Posit Number System was introduced in 2017 as a replacement for floa...

HADES: Hardware/Algorithm Co-design in DNN accelerators using Energy-efficient Approximate Alphabet Set Multipliers

Edge computing must be capable of executing computationally intensive al...

Ultra-Fast, High-Performance 8x8 Approximate Multipliers by a New Multicolumn 3,3:2 Inexact Compressor and its Derivatives

Multiplier, as a key role in many different applications, is a time-cons...

Please sign up or login with your details

Forgot password? Click here to reset