CipherDAug: Ciphertext based Data Augmentation for Neural Machine Translation

04/01/2022
by   Nishant Kambhatla, et al.
0

We propose a novel data-augmentation technique for neural machine translation based on ROT-k ciphertexts. ROT-k is a simple letter substitution cipher that replaces a letter in the plaintext with the kth letter after it in the alphabet. We first generate multiple ROT-k ciphertexts using different values of k for the plaintext which is the source side of the parallel data. We then leverage this enciphered training data along with the original parallel data via multi-source training to improve neural machine translation. Our method, CipherDAug, uses a co-regularization-inspired training procedure, requires no external data sources other than the original training data, and uses a standard Transformer to outperform strong data augmentation techniques on several datasets by a significant margin. This technique combines easily with existing approaches to data augmentation, and yields particularly strong results in low-resource settings.

READ FULL TEXT
research
05/10/2022

AdMix: A Mixed Sample Data Augmentation Method for Neural Machine Translation

In Neural Machine Translation (NMT), data augmentation methods such as b...
research
07/26/2023

Data Augmentation for Neural Machine Translation using Generative Language Model

Despite the rapid growth in model architecture, the scarcity of large pa...
research
04/27/2023

NAP at SemEval-2023 Task 3: Is Less Really More? (Back-)Translation as Data Augmentation Strategies for Detecting Persuasion Techniques

Persuasion techniques detection in news in a multi-lingual setup is non-...
research
06/05/2019

Efficient, Lexicon-Free OCR using Deep Learning

Contrary to popular belief, Optical Character Recognition (OCR) remains ...
research
05/27/2023

Disambiguated Lexically Constrained Neural Machine Translation

Lexically constrained neural machine translation (LCNMT), which controls...
research
12/20/2022

Original or Translated? On the Use of Parallel Data for Translation Quality Estimation

Machine Translation Quality Estimation (QE) is the task of evaluating tr...
research
02/28/2022

LCP-dropout: Compression-based Multiple Subword Segmentation for Neural Machine Translation

In this study, we propose a simple and effective preprocessing method fo...

Please sign up or login with your details

Forgot password? Click here to reset