Bag of Tricks for Optimizing Transformer Efficiency

09/09/2021
by   Ye Lin, et al.
0

Improving Transformer efficiency has become increasingly attractive recently. A wide range of methods has been proposed, e.g., pruning, quantization, new architectures and etc. But these methods are either sophisticated in implementation or dependent on hardware. In this paper, we show that the efficiency of Transformer can be improved by combining some simple and hardware-agnostic methods, including tuning hyper-parameters, better design choices and training strategies. On the WMT news translation tasks, we improve the inference efficiency of a strong Transformer system by 3.80X on CPU and 2.52X on GPU. The code is publicly available at https://github.com/Lollipop321/mini-decoder-network.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/16/2021

The NiuTrans System for WNGT 2020 Efficiency Task

This paper describes the submissions of the NiuTrans Team to the WNGT 20...
research
12/02/2021

SwinTrack: A Simple and Strong Baseline for Transformer Tracking

Transformer has recently demonstrated clear potential in improving visua...
research
09/16/2021

The NiuTrans System for the WMT21 Efficiency Task

This paper describes the NiuTrans system for the WMT21 translation effic...
research
04/24/2020

Lite Transformer with Long-Short Range Attention

Transformer has become ubiquitous in natural language processing (e.g., ...
research
02/20/2020

Neural Network Compression Framework for fast model inference

In this work we present a new framework for neural networks compression ...
research
06/25/2022

PLATON: Pruning Large Transformer Models with Upper Confidence Bound of Weight Importance

Large Transformer-based models have exhibited superior performance in va...
research
11/09/2022

Training a Vision Transformer from scratch in less than 24 hours with 1 GPU

Transformers have become central to recent advances in computer vision. ...

Please sign up or login with your details

Forgot password? Click here to reset