GanLM: Encoder-Decoder Pre-training with an Auxiliary Discriminator

12/20/2022
by   Jian Yang, et al.
0

Pre-trained models have achieved remarkable success in natural language processing (NLP). However, existing pre-training methods underutilize the benefits of language understanding for generation. Inspired by the idea of Generative Adversarial Networks (GANs), we propose a GAN-style model for encoder-decoder pre-training by introducing an auxiliary discriminator, unifying the ability of language understanding and generation in a single model. Our model, named as GanLM, is trained with two pre-training objectives: replaced token detection and replaced token denoising. Specifically, given masked source sentences, the generator outputs the target distribution and the discriminator predicts whether the target sampled tokens from distribution are incorrect. The target sentence is replaced with misclassified tokens to construct noisy previous context, which is used to generate the gold sentence. In general, both tasks improve the ability of language understanding and generation by selectively using the denoising data. Extensive experiments in language generation benchmarks show that GanLM with the powerful language understanding capability outperforms various strong pre-trained language models (PLMs) and achieves state-of-the-art performance.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/30/2020

SegaBERT: Pre-training of Segment-aware BERT for Language Understanding

Pre-trained language models have achieved state-of-the-art results in va...
research
09/13/2021

CPT: A Pre-Trained Unbalanced Transformer for Both Chinese Language Understanding and Generation

In this paper, we take the advantage of previous pre-trained models (PTM...
research
11/18/2021

DeBERTaV3: Improving DeBERTa using ELECTRA-Style Pre-Training with Gradient-Disentangled Embedding Sharing

This paper presents a new pre-trained language model, DeBERTaV3, which i...
research
06/28/2021

Knowledge Transfer by Discriminative Pre-training for Academic Performance Prediction

The needs for precisely estimating a student's academic performance have...
research
05/09/2022

Improving negation detection with negation-focused pre-training

Negation is a common linguistic feature that is crucial in many language...
research
06/10/2020

MC-BERT: Efficient Language Pre-Training via a Meta Controller

Pre-trained contextual representations (e.g., BERT) have become the foun...
research
06/25/2021

Learning to Sample Replacements for ELECTRA Pre-Training

ELECTRA pretrains a discriminator to detect replaced tokens, where the r...

Please sign up or login with your details

Forgot password? Click here to reset