ViTCoD: Vision Transformer Acceleration via Dedicated Algorithm and Accelerator Co-Design

by   Haoran You, et al.

Vision Transformers (ViTs) have achieved state-of-the-art performance on various vision tasks. However, ViTs' self-attention module is still arguably a major bottleneck, limiting their achievable hardware efficiency. Meanwhile, existing accelerators dedicated to NLP Transformers are not optimal for ViTs. This is because there is a large difference between ViTs and NLP Transformers: ViTs have a relatively fixed number of input tokens, whose attention maps can be pruned by up to 90 need to handle input sequences of varying numbers of tokens and rely on on-the-fly predictions of dynamic sparse attention patterns for each input to achieve a decent sparsity (e.g., >=50 algorithm and accelerator co-design framework dubbed ViTCoD for accelerating ViTs. Specifically, on the algorithm level, ViTCoD prunes and polarizes the attention maps to have either denser or sparser fixed patterns for regularizing two levels of workloads without hurting the accuracy, largely reducing the attention computations while leaving room for alleviating the remaining dominant data movements; on top of that, we further integrate a lightweight and learnable auto-encoder module to enable trading the dominant high-cost data movements for lower-cost computations. On the hardware level, we develop a dedicated accelerator to simultaneously coordinate the enforced denser/sparser workloads and encoder/decoder engines for boosted hardware utilization. Extensive experiments and ablation studies validate that ViTCoD largely reduces the dominant data movement costs, achieving speedups of up to 235.3x, 142.9x, 86.0x, 10.1x, and 6.8x over general computing platforms CPUs, EdgeGPUs, GPUs, and prior-art Transformer accelerators SpAtten and Sanger under an attention sparsity of 90


page 1

page 3

page 4

page 5

page 7

page 8

page 9


ViTALiTy: Unifying Low-rank and Sparse Approximation for Vision Transformer Acceleration with a Linear Taylor Attention

Vision Transformer (ViT) has emerged as a competitive alternative to con...

Transformer Acceleration with Dynamic Sparse Attention

Transformers are the mainstream of NLP applications and are becoming inc...

AccelTran: A Sparsity-Aware Accelerator for Dynamic Inference with Transformers

Self-attention-based transformer models have achieved tremendous success...

GCoD: Graph Convolutional Network Acceleration via Dedicated Algorithm and Accelerator Co-Design

Graph Convolutional Networks (GCNs) have emerged as the state-of-the-art...

X-Former: In-Memory Acceleration of Transformers

Transformers have achieved great success in a wide variety of natural la...

SALO: An Efficient Spatial Accelerator Enabling Hybrid Sparse Attention Mechanisms for Long Sequences

The attention mechanisms of transformers effectively extract pertinent i...

Breaking BERT: Evaluating and Optimizing Sparsified Attention

Transformers allow attention between all pairs of tokens, but there is r...

Please sign up or login with your details

Forgot password? Click here to reset