Recyclable Tuning for Continual Pre-training

05/15/2023
by   Yujia Qin, et al.
0

Continual pre-training is the paradigm where pre-trained language models (PLMs) continually acquire fresh knowledge from growing data and gradually get upgraded. Before an upgraded PLM is released, we may have tuned the original PLM for various tasks and stored the adapted weights. However, when tuning the upgraded PLM, these outdated adapted weights will typically be ignored and discarded, causing a potential waste of resources. We bring this issue to the forefront and contend that proper algorithms for recycling outdated adapted weights should be developed. To this end, we formulate the task of recyclable tuning for continual pre-training. In pilot studies, we find that after continual pre-training, the upgraded PLM remains compatible with the outdated adapted weights to some extent. Motivated by this finding, we analyze the connection between continually pre-trained PLMs from two novel aspects, i.e., mode connectivity, and functional similarity. Based on the corresponding findings, we propose both an initialization-based method and a distillation-based method for our task. We demonstrate their feasibility in improving the convergence and performance for tuning the upgraded PLM. We also show that both methods can be combined to achieve better performance. The source codes are publicly available at https://github.com/thunlp/RecyclableTuning.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/08/2023

Revisit Few-shot Intent Classification with PLMs: Direct Fine-tuning vs. Continual Pre-training

We consider the task of few-shot intent detection, which involves traini...
research
02/07/2023

Continual Pre-training of Language Models

Language models (LMs) have been instrumental for the rapid advance of na...
research
04/20/2020

Adversarial Training for Large Neural Language Models

Generalization and robustness are both key desiderata for designing mach...
research
03/12/2022

ELLE: Efficient Lifelong Pre-training for Emerging Data

Current pre-trained language models (PLM) are typically trained with sta...
research
10/25/2022

Exploring Mode Connectivity for Pre-trained Language Models

Recent years have witnessed the prevalent application of pre-trained lan...
research
04/30/2022

Foundational Models for Continual Learning: An Empirical Study of Latent Replay

Rapid development of large-scale pre-training has resulted in foundation...
research
05/20/2023

Can NLP Models Correctly Reason Over Contexts that Break the Common Assumptions?

Pre-training on large corpora of text enables the language models to acq...

Please sign up or login with your details

Forgot password? Click here to reset