Efficient Sparsely Activated Transformers

08/31/2022
by   Salar Latifi, et al.
14

Transformer-based neural networks have achieved state-of-the-art task performance in a number of machine learning domains including natural language processing and computer vision. To further improve their accuracy, recent work has explored the integration of dynamic behavior into these networks in the form of mixture-of-expert (MoE) layers. In this paper, we explore the introduction of MoE layers to optimize a different metric: inference latency. We introduce a novel system named PLANER that takes an existing Transformer-based network and a user-defined latency target and produces an optimized, sparsely-activated version of the original network that tries to meet the latency target while maintaining baseline accuracy. We evaluate PLANER on two real-world language modeling tasks using the Transformer-XL network and achieve inference latency reductions of over 2x at iso-accuracy.

READ FULL TEXT

page 3

page 5

page 9

page 10

research
10/16/2021

Transformer with a Mixture of Gaussian Keys

Multi-head attention is a driving force behind state-of-the-art transfor...
research
03/23/2023

Primer: Fast Private Transformer Inference on Encrypted Data

It is increasingly important to enable privacy-preserving inference for ...
research
04/20/2023

An Introduction to Transformers

The transformer is a neural network component that can be used to learn ...
research
06/13/2023

A Cloud-based Machine Learning Pipeline for the Efficient Extraction of Insights from Customer Reviews

The efficiency of natural language processing has improved dramatically ...
research
06/29/2021

Multi-Exit Vision Transformer for Dynamic Inference

Deep neural networks can be converted to multi-exit architectures by ins...
research
12/06/2022

Enabling and Accelerating Dynamic Vision Transformer Inference for Real-Time Applications

Many state-of-the-art deep learning models for computer vision tasks are...
research
06/10/2023

ShiftAddViT: Mixture of Multiplication Primitives Towards Efficient Vision Transformer

Vision Transformers (ViTs) have shown impressive performance and have be...

Please sign up or login with your details

Forgot password? Click here to reset