Infusing Sequential Information into Conditional Masked Translation Model with Self-Review Mechanism

10/19/2020
by   Pan Xie, et al.
0

Non-autoregressive models generate target words in a parallel way, which achieve a faster decoding speed but at the sacrifice of translation accuracy. To remedy a flawed translation by non-autoregressive models, a promising approach is to train a conditional masked translation model (CMTM), and refine the generated results within several iterations. Unfortunately, such approach hardly considers the sequential dependency among target words, which inevitably results in a translation degradation. Hence, instead of solely training a Transformer-based CMTM, we propose a Self-Review Mechanism to infuse sequential information into it. Concretely, we insert a left-to-right mask to the same decoder of CMTM, and then induce it to autoregressively review whether each generated word from CMTM is supposed to be replaced or kept. The experimental results (WMT14 En↔De and WMT16 En↔Ro) demonstrate that our model uses dramatically less training computations than the typical CMTM, as well as outperforms several state-of-the-art non-autoregressive models by over 1 BLEU. Through knowledge distillation, our model even surpasses a typical left-to-right Transformer model, while significantly speeding up decoding.

READ FULL TEXT

page 3

page 9

research
06/22/2019

Retrieving Sequential Information for Non-Autoregressive Neural Machine Translation

Non-Autoregressive Transformer (NAT) aims to accelerate the Transformer ...
research
04/19/2019

Constant-Time Machine Translation with Conditional Masked Language Models

Most machine translation systems generate text autoregressively, by sequ...
research
02/08/2019

Insertion Transformer: Flexible Sequence Generation via Insertion Operations

We present the Insertion Transformer, an iterative, partially autoregres...
research
09/14/2021

Non-autoregressive Transformer with Unified Bidirectional Decoder for Automatic Speech Recognition

Non-autoregressive (NAR) transformer models have been studied intensivel...
research
10/27/2020

Fast Interleaved Bidirectional Sequence Generation

Independence assumptions during sequence generation can speed up inferen...
research
01/23/2020

Semi-Autoregressive Training Improves Mask-Predict Decoding

The recently proposed mask-predict decoding algorithm has narrowed the p...
research
10/12/2022

Non-Autoregressive Machine Translation with Translation Memories

Non-autoregressive machine translation (NAT) has recently made great pro...

Please sign up or login with your details

Forgot password? Click here to reset