One Reference Is Not Enough: Diverse Distillation with Reference Selection for Non-Autoregressive Translation

05/28/2022
by   Chenze Shao, et al.
0

Non-autoregressive neural machine translation (NAT) suffers from the multi-modality problem: the source sentence may have multiple correct translations, but the loss function is calculated only according to the reference sentence. Sequence-level knowledge distillation makes the target more deterministic by replacing the target with the output from an autoregressive model. However, the multi-modality problem in the distilled dataset is still nonnegligible. Furthermore, learning from a specific teacher limits the upper bound of the model capability, restricting the potential of NAT models. In this paper, we argue that one reference is not enough and propose diverse distillation with reference selection (DDRS) for NAT. Specifically, we first propose a method called SeedDiv for diverse machine translation, which enables us to generate a dataset containing multiple high-quality reference translations for each source sentence. During the training, we compare the NAT output with all references and select the one that best fits the NAT output to train the model. Experiments on widely-used machine translation benchmarks demonstrate the effectiveness of DDRS, which achieves 29.82 BLEU with only one decoding pass on WMT14 En-De, improving the state-of-the-art performance for NAT by over 1 BLEU. Source code: https://github.com/ictnlp/DDRS-NAT

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/30/2022

Rephrasing the Reference for Non-Autoregressive Machine Translation

Non-autoregressive neural machine translation (NAT) models suffer from t...
research
11/21/2019

Minimizing the Bag-of-Ngrams Difference for Non-Autoregressive Neural Machine Translation

Non-Autoregressive Neural Machine Translation (NAT) achieves significant...
research
03/12/2023

Fuzzy Alignments in Directed Acyclic Graph for Non-Autoregressive Machine Translation

Non-autoregressive translation (NAT) reduces the decoding latency but su...
research
03/31/2023

Selective Knowledge Distillation for Non-Autoregressive Neural Machine Translation

Benefiting from the sequence-level knowledge distillation, the Non-Autor...
research
09/14/2021

AligNART: Non-autoregressive Neural Machine Translation by Jointly Learning to Estimate Alignment and Translate

Non-autoregressive neural machine translation (NART) models suffer from ...
research
06/07/2022

DiMS: Distilling Multiple Steps of Iterative Non-Autoregressive Transformers

The computational benefits of iterative non-autoregressive transformers ...
research
04/28/2022

UniTE: Unified Translation Evaluation

Translation quality evaluation plays a crucial role in machine translati...

Please sign up or login with your details

Forgot password? Click here to reset