Arbitrary Few Parameters are Good Enough for Adapting Large-scale Pre-trained Language Models

by   Yusheng Su, et al.

Parameter-efficient tuning (PET) methods can effectively drive extremely large pre-trained language models (PLMs) by only training minimal parameters. Different PET methods utilize different manually designed modules. In a small PLM, there are usually noticeable performance differences among PET methods. Nevertheless, when a PLM's scale grows up to tens of billions of parameters, all PET methods achieve almost the same performance and even perform on par with the full-parameter fine-tuning method. Hence, we hypothesize that model scaling can mitigate the design differences (the module structures and the number of trainable parameters) among PET methods. To study this hypothesis, we introduce a more flexible PET method - arbitrary PET (APET) method - to be compatible with arbitrary module structures and any number of trainable parameters. Then, we experiment on 11 NLP tasks of 5 types and 2 representative PLMs. From our investigations, we find that the model scaling (1) mitigates the effects of the arbitrary module structure on the performance of tuning methods, and (2) enables the tuning methods to optimize fewer parameters to achieve the full-parameter fine-tuning performance. Intriguingly, we also observe that all tuning methods require almost the same number of trainable parameters to drive PLMs. We discuss this phenomenon and the above two findings collectively from optimization perspectives to fathom the mechanisms behind them. These conclusions not only demonstrate the positive impact of model scaling on tuning methods but disclose its mechanisms, which help us design more effective and efficient tuning methods on larger-scale PLMs.


page 4

page 6

page 15


Sparse Structure Search for Parameter-Efficient Tuning

Adapting large pre-trained models (PTMs) through fine-tuning imposes pro...

AlphaTuning: Quantization-Aware Parameter-Efficient Adaptation of Large-Scale Pre-Trained Language Models

There are growing interests in adapting large-scale language models usin...

IncreLoRA: Incremental Parameter Allocation Method for Parameter-Efficient Fine-tuning

With the increasing size of pre-trained language models (PLMs), fine-tun...

Evaluating the Impact of Model Scale for Compositional Generalization in Semantic Parsing

Despite their strong performance on many tasks, pre-trained language mod...

VL-PET: Vision-and-Language Parameter-Efficient Tuning via Granularity Control

As the model size of pre-trained language models (PLMs) grows rapidly, f...

Parameter-Efficient Fine-Tuning Design Spaces

Parameter-efficient fine-tuning aims to achieve performance comparable t...

Empirical Analysis of the Strengths and Weaknesses of PEFT Techniques for LLMs

As foundation models continue to exponentially scale in size, efficient ...

Please sign up or login with your details

Forgot password? Click here to reset