Revisiting Non-Autoregressive Translation at Scale

05/25/2023
by   Zhihao Wang, et al.
0

In real-world systems, scaling has been critical for improving the translation quality in autoregressive translation (AT), which however has not been well studied for non-autoregressive translation (NAT). In this work, we bridge the gap by systematically studying the impact of scaling on NAT behaviors. Extensive experiments on six WMT benchmarks over two advanced NAT models show that scaling can alleviate the commonly-cited weaknesses of NAT models, resulting in better translation performance. To reduce the side-effect of scaling on decoding speed, we empirically investigate the impact of NAT encoder and decoder on the translation performance. Experimental results on the large-scale WMT20 En-De show that the asymmetric architecture (e.g. bigger encoder and smaller decoder) can achieve comparable performance with the scaling model, while maintaining the superiority of decoding speed with standard NAT models. To this end, we establish a new benchmark by validating scaled NAT models on the scaled dataset, which can be regarded as a strong baseline for future works. We release code, models and system outputs at https://github.com/DeepLearnXMU/Scaling4NAT.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/18/2020

Deep Encoder, Shallow Decoder: Reevaluating the Speed-Quality Tradeoff in Machine Translation

State-of-the-art neural machine translation models generate outputs auto...
research
07/15/2022

ScaleNet: Searching for the Model to Scale

Recently, community has paid increasing attention on model scaling and c...
research
10/25/2020

Orthros: Non-autoregressive End-to-end Speech Translation with Dual-decoder

Fast inference speed is an important goal towards real-world deployment ...
research
03/30/2022

Lossless Speedup of Autoregressive Translation with Generalized Aggressive Decoding

In this paper, we propose Generalized Aggressive Decoding (GAD) – a nove...
research
03/14/2023

RenewNAT: Renewing Potential Translation for Non-Autoregressive Transformer

Non-autoregressive neural machine translation (NAT) models are proposed ...
research
09/09/2021

Non-autoregressive End-to-end Speech Translation with Parallel Autoregressive Rescoring

This article describes an efficient end-to-end speech translation (E2E-S...
research
02/20/2018

On the scaling of polynomial features for representation matching

In many neural models, new features as polynomial functions of existing ...

Please sign up or login with your details

Forgot password? Click here to reset