Hardware-Guided Symbiotic Training for Compact, Accurate, yet Execution-Efficient LSTM

01/30/2019
by   Hongxu Yin, et al.
0

Many long short-term memory (LSTM) applications need fast yet compact models. Neural network compression approaches, such as the grow-and-prune paradigm, have proved to be promising for cutting down network complexity by skipping insignificant weights. However, current compression strategies are mostly hardware-agnostic and network complexity reduction does not always translate into execution efficiency. In this work, we propose a hardware-guided symbiotic training methodology for compact, accurate, yet execution-efficient inference models. It is based on our observation that hardware may introduce substantial non-monotonic behavior, which we call the latency hysteresis effect, when evaluating network size vs. inference latency. This observation raises question about the mainstream smaller-dimension-is-better compression strategy, which often leads to a sub-optimal model architecture. By leveraging the hardware-impacted hysteresis effect and sparsity, we are able to achieve the symbiosis of model compactness and accuracy with execution efficiency, thus reducing LSTM latency while increasing its accuracy. We have evaluated our algorithms on language modeling and speech recognition applications. Relative to the traditional stacked LSTM architecture obtained for the Penn Treebank dataset, we reduce the number of parameters by 18.0x (30.5x) and measured run-time latency by up to 2.4x (5.2x) on Nvidia GPUs (Intel Xeon CPUs) without any accuracy degradation. For the DeepSpeech2 architecture obtained for the AN4 dataset, we reduce the number of parameters by 7.0x (19.4x), word error rate from 12.9 on Nvidia GPUs (Intel Xeon CPUs). Thus, our method yields compact, accurate, yet execution-efficient inference models.

READ FULL TEXT
research
05/30/2018

Grow and Prune Compact, Fast, and Accurate LSTMs

Long short-term memory (LSTM) has been widely used for sequential data m...
research
05/30/2018

Grow and Prune Compact, Fast, and AccurateLSTMs

Long short-term memory (LSTM) has been widely used for sequential data m...
research
08/17/2017

An Improved Residual LSTM Architecture for Acoustic Modeling

Long Short-Term Memory (LSTM) is the primary recurrent neural networks a...
research
12/05/2022

Algorithm and Hardware Co-Design of Energy-Efficient LSTM Networks for Video Recognition with Hierarchical Tucker Tensor Decomposition

Long short-term memory (LSTM) is a type of powerful deep neural network ...
research
04/19/2019

SCANN: Synthesis of Compact and Accurate Neural Networks

Artificial neural networks (ANNs) have become the driving force behind r...
research
05/27/2019

Efficient Network Construction through Structural Plasticity

Deep Neural Networks (DNNs) on hardware is facing excessive computation ...
research
11/07/2022

TDC: Towards Extremely Efficient CNNs on GPUs via Hardware-Aware Tucker Decomposition

Tucker decomposition is one of the SOTA CNN model compression techniques...

Please sign up or login with your details

Forgot password? Click here to reset