Recipes for Sequential Pre-training of Multilingual Encoder and Seq2Seq Models

06/14/2023
by   Saleh Soltan, et al.
0

Pre-trained encoder-only and sequence-to-sequence (seq2seq) models each have advantages, however training both model types from scratch is computationally expensive. We explore recipes to improve pre-training efficiency by initializing one model from the other. (1) Extracting the encoder from a seq2seq model, we show it under-performs a Masked Language Modeling (MLM) encoder, particularly on sequence labeling tasks. Variations of masking during seq2seq training, reducing the decoder size, and continuing with a small amount of MLM training do not close the gap. (2) Conversely, using an encoder to warm-start seq2seq training, we show that by unfreezing the encoder partway through training, we can match task performance of a from-scratch seq2seq model. Overall, this two-stage approach is an efficient recipe to obtain both a multilingual encoder and a seq2seq model, matching the performance of training each model from scratch while reducing the total compute cost by 27

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/01/2021

SJ_AJ@DravidianLangTech-EACL2021: Task-Adaptive Pre-Training of Multilingual BERT models for Offensive Language Identification

In this paper we present our submission for the EACL 2021-Shared Task on...
research
02/18/2021

Less is More: Pre-training a Strong Siamese Encoder Using a Weak Decoder

Many real-world applications use Siamese networks to efficiently match t...
research
01/22/2020

Multilingual Denoising Pre-training for Neural Machine Translation

This paper demonstrates that multilingual denoising pre-training produce...
research
06/03/2021

nmT5 – Is parallel data still relevant for pre-training massively multilingual language models?

Recently, mT5 - a massively multilingual version of T5 - leveraged a uni...
research
03/11/2021

CANINE: Pre-training an Efficient Tokenization-Free Encoder for Language Representation

Pipelined NLP systems have largely been superseded by end-to-end neural ...
research
04/02/2021

HMM-Free Encoder Pre-Training for Streaming RNN Transducer

This work describes an encoder pre-training procedure using frame-wise l...
research
03/03/2023

RePreM: Representation Pre-training with Masked Model for Reinforcement Learning

Inspired by the recent success of sequence modeling in RL and the use of...

Please sign up or login with your details

Forgot password? Click here to reset