Learning Better Masking for Better Language Model Pre-training

08/23/2022
by   Dongjie Yang, et al.
10

Masked Language Modeling (MLM) has been widely used as the denoising objective in pre-training language models (PrLMs). Existing PrLMs commonly adopt a random-token masking strategy where a fixed masking ratio is applied and different contents are masked by an equal probability throughout the entire training. However, the model may receive complicated impact from pre-training status, which changes accordingly as training time goes on. In this paper, we show that such time-invariant MLM settings on masking ratio and masked content are unlikely to deliver an optimal outcome, which motivates us to explore the influence of time-variant MLM settings. We propose two scheduled masking approaches that adaptively tune the masking ratio and contents in different training stages, which improves the pre-training efficiency and effectiveness verified on the downstream tasks. Our work is a pioneer study on time-variant masking strategy on ratio and contents and gives a better understanding of how masking ratio and masked content influence the MLM pre-training.

READ FULL TEXT

page 2

page 4

page 5

page 6

page 7

page 8

page 9

page 10

research
10/25/2022

Same Pre-training Loss, Better Downstream: Implicit Bias Matters for Language Models

Language modeling on large-scale datasets leads to impressive performanc...
research
05/19/2023

Cross-Lingual Supervision improves Large Language Models Pre-training

The recent rapid progress in pre-training Large Language Models has reli...
research
10/27/2022

Retrieval Oriented Masking Pre-training Language Model for Dense Passage Retrieval

Pre-trained language model (PTM) has been shown to yield powerful text r...
research
06/11/2023

QUERT: Continual Pre-training of Language Model for Query Understanding in Travel Domain Search

In light of the success of the pre-trained language models (PLMs), conti...
research
10/13/2021

Maximizing Efficiency of Language Model Pre-training for Learning Representation

Pre-trained language models in the past years have shown exponential gro...
research
01/20/2023

Ontology Pre-training for Poison Prediction

Integrating human knowledge into neural networks has the potential to im...
research
04/12/2023

Hard Patches Mining for Masked Image Modeling

Masked image modeling (MIM) has attracted much research attention due to...

Please sign up or login with your details

Forgot password? Click here to reset