Semi-Supervised Learning for Neural Machine Translation

06/15/2016
by   Yong Cheng, et al.
0

While end-to-end neural machine translation (NMT) has made remarkable progress recently, NMT systems only rely on parallel corpora for parameter estimation. Since parallel corpora are usually limited in quantity, quality, and coverage, especially for low-resource languages, it is appealing to exploit monolingual corpora to improve NMT. We propose a semi-supervised approach for training NMT models on the concatenation of labeled (parallel corpora) and unlabeled (monolingual corpora) data. The central idea is to reconstruct the monolingual corpora using an autoencoder, in which the source-to-target and target-to-source translation models serve as the encoder and decoder, respectively. Our approach can not only exploit the monolingual corpora of the target language, but also of the source language. Experiments on the Chinese-English dataset show that our approach achieves significant improvements over state-of-the-art SMT and NMT systems.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/01/2019

A Survey of Methods to Leverage Monolingual Data in Low-resource Neural Machine Translation

Neural machine translation has become the state-of-the-art for language ...
research
03/01/2018

Joint Training for Neural Machine Translation Models with Monolingual Data

Monolingual data have been demonstrated to be helpful in improving trans...
research
02/17/2022

End-to-End Training of Both Translation Models in the Back-Translation Framework

Semi-supervised learning algorithms in neural machine translation (NMT) ...
research
01/23/2020

Pre-training via Leveraging Assisting Languages and Data Selection for Neural Machine Translation

Sequence-to-sequence (S2S) pre-training using large monolingual data is ...
research
09/09/2021

HintedBT: Augmenting Back-Translation with Quality and Transliteration Hints

Back-translation (BT) of target monolingual corpora is a widely used dat...
research
11/03/2017

Towards Neural Machine Translation with Partially Aligned Corpora

While neural machine translation (NMT) has become the new paradigm, the ...
research
06/08/2021

Self-supervised and Supervised Joint Training for Resource-rich Machine Translation

Self-supervised pre-training of text representations has been successful...

Please sign up or login with your details

Forgot password? Click here to reset