Efficient Softmax Approximation for Deep Neural Networks with Attention Mechanism

11/21/2021
by   Ihor Vasyltsov, et al.
0

There has been a rapid advance of custom hardware (HW) for accelerating the inference speed of deep neural networks (DNNs). Previously, the softmax layer was not a main concern of DNN accelerating HW, because its portion is relatively small in multi-layer perceptron or convolutional neural networks. However, as the attention mechanisms are widely used in various modern DNNs, a cost-efficient implementation of softmax layer is becoming very important. In this paper, we propose two methods to approximate softmax computation, which are based on the usage of LookUp Tables (LUTs). The required size of LUT is quite small (about 700 Bytes) because ranges of numerators and denominators of softmax are stable if normalization is applied to the input. We have validated the proposed technique over different AI tasks (object detection, machine translation, sentiment analysis, and semantic equivalence) and DNN models (DETR, Transformer, BERT) by a variety of benchmarks (COCO17, WMT14, WMT17, GLUE). We showed that 8-bit approximation allows to obtain acceptable accuracy loss below 1.0%.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/05/2017

Adversarial Robustness: Softmax versus Openmax

Deep neural networks (DNNs) provide state-of-the-art results on various ...
research
10/03/2018

Inhibited Softmax for Uncertainty Estimation in Neural Networks

We present a new method for uncertainty estimation and out-of-distributi...
research
11/16/2021

Assessing Deep Neural Networks as Probability Estimators

Deep Neural Networks (DNNs) have performed admirably in classification t...
research
03/03/2022

Weightless Neural Networks for Efficient Edge Inference

Weightless Neural Networks (WNNs) are a class of machine learning model ...
research
02/22/2020

A^3: Accelerating Attention Mechanisms in Neural Networks with Approximation

With the increasing computational demands of neural networks, many hardw...
research
08/16/2021

Escaping the Gradient Vanishing: Periodic Alternatives of Softmax in Attention Mechanism

Softmax is widely used in neural networks for multiclass classification,...
research
05/22/2017

A Regularized Framework for Sparse and Structured Neural Attention

Modern neural networks are often augmented with an attention mechanism, ...

Please sign up or login with your details

Forgot password? Click here to reset