Accelerating BERT Inference for Sequence Labeling via Early-Exit

by   Xiaonan Li, et al.

Both performance and efficiency are crucial factors for sequence labeling tasks in many real-world scenarios. Although the pre-trained models (PTMs) have significantly improved the performance of various sequence labeling tasks, their computational cost is expensive. To alleviate this problem, we extend the recent successful early-exit mechanism to accelerate the inference of PTMs for sequence labeling tasks. However, existing early-exit mechanisms are specifically designed for sequence-level tasks, rather than sequence labeling. In this paper, we first propose a simple extension of sentence-level early-exit for sequence labeling tasks. To further reduce the computational cost, we also propose a token-level early-exit mechanism that allows partial tokens to exit early at different layers. Considering the local dependency inherent in sequence labeling, we employed a window-based criterion to decide for a token whether or not to exit. The token-level early-exit brings the gap between training and inference, so we introduce an extra self-sampling fine-tuning stage to alleviate it. The extensive experiments on three popular sequence labeling tasks show that our approach can save up to 66 with minimal performance degradation. Compared with competitive compressed models such as DistilBERT, our approach can achieve better performance under the same speed-up ratios of 2X, 3X, and 4X.


page 1

page 2

page 3

page 4


Constraint-aware and Ranking-distilled Token Pruning for Efficient Transformer Inference

Deploying pre-trained transformer models like BERT on downstream tasks i...

Magic Pyramid: Accelerating Inference with Early Exiting and Token Pruning

Pre-training and then fine-tuning large language models is commonly used...

TR-BERT: Dynamic Token Reduction for Accelerating BERT Inference

Existing pre-trained language models (PLMs) are often computationally ex...

SkipDecode: Autoregressive Skip Decoding with Batching and Caching for Efficient LLM Inference

Autoregressive large language models (LLMs) have made remarkable progres...

Unifying Token and Span Level Supervisions for Few-Shot Sequence Labeling

Few-shot sequence labeling aims to identify novel classes based on only ...

Elbert: Fast Albert with Confidence-Window Based Early Exit

Despite the great success in Natural Language Processing (NLP) area, lar...

PALBERT: Teaching ALBERT to Ponder

Currently, pre-trained models can be considered the default choice for a...

Please sign up or login with your details

Forgot password? Click here to reset