A Win-win Deal: Towards Sparse and Robust Pre-trained Language Models

by   Yuanxin Liu, et al.

Despite the remarkable success of pre-trained language models (PLMs), they still face two challenges: First, large-scale PLMs are inefficient in terms of memory footprint and computation. Second, on the downstream tasks, PLMs tend to rely on the dataset bias and struggle to generalize to out-of-distribution (OOD) data. In response to the efficiency problem, recent studies show that dense PLMs can be replaced with sparse subnetworks without hurting the performance. Such subnetworks can be found in three scenarios: 1) the fine-tuned PLMs, 2) the raw PLMs and then fine-tuned in isolation, and even inside 3) PLMs without any parameter fine-tuning. However, these results are only obtained in the in-distribution (ID) setting. In this paper, we extend the study on PLMs subnetworks to the OOD setting, investigating whether sparsity and robustness to dataset bias can be achieved simultaneously. To this end, we conduct extensive experiments with the pre-trained BERT model on three natural language understanding (NLU) tasks. Our results demonstrate that sparse and robust subnetworks (SRNets) can consistently be found in BERT, across the aforementioned three scenarios, using different training and compression methods. Furthermore, we explore the upper bound of SRNets using the OOD information and show that there exist sparse and almost unbiased BERT subnetworks. Finally, we present 1) an analytical study that provides insights on how to promote the efficiency of SRNets searching process and 2) a solution to improve subnetworks' performance at high sparsity. The code is available at https://github.com/llyx97/sparse-and-robust-PLM.


page 1

page 2

page 3

page 4


DSEE: Dually Sparsity-embedded Efficient Tuning of Pre-trained Language Models

Gigantic pre-trained models have become central to natural language proc...

Is Fine-tuning Needed? Pre-trained Language Models Are Near Perfect for Out-of-Domain Detection

Out-of-distribution (OOD) detection is a critical task for reliable pred...

Compressing And Debiasing Vision-Language Pre-Trained Models for Visual Question Answering

Despite the excellent performance of large-scale vision-language pre-tra...

Exploring the Universal Vulnerability of Prompt-based Learning Paradigm

Prompt-based learning paradigm bridges the gap between pre-training and ...

Transfer Learning Robustness in Multi-Class Categorization by Fine-Tuning Pre-Trained Contextualized Language Models

This study compares the effectiveness and robustness of multi-class cate...

Investigating Ensemble Methods for Model Robustness Improvement of Text Classifiers

Large pre-trained language models have shown remarkable performance over...

Revisiting Out-of-distribution Robustness in NLP: Benchmark, Analysis, and LLMs Evaluations

This paper reexamines the research on out-of-distribution (OOD) robustne...

Please sign up or login with your details

Forgot password? Click here to reset