BERT, mBERT, or BiBERT? A Study on Contextualized Embeddings for Neural Machine Translation

09/09/2021
by   Haoran Xu, et al.
0

The success of bidirectional encoders using masked language models, such as BERT, on numerous natural language processing tasks has prompted researchers to attempt to incorporate these pre-trained models into neural machine translation (NMT) systems. However, proposed methods for incorporating pre-trained models are non-trivial and mainly focus on BERT, which lacks a comparison of the impact that other pre-trained models may have on translation performance. In this paper, we demonstrate that simply using the output (contextualized embeddings) of a tailored and suitable bilingual pre-trained language model (dubbed BiBERT) as the input of the NMT encoder achieves state-of-the-art translation performance. Moreover, we also propose a stochastic layer selection approach and a concept of dual-directional translation model to ensure the sufficient utilization of contextualized embeddings. In the case of without using back translation, our best models achieve BLEU scores of 30.45 for En->De and 38.61 for De->En on the IWSLT'14 dataset, and 31.26 for En->De and 34.94 for De->En on the WMT'14 dataset, which exceeds all published numbers.

READ FULL TEXT
research
08/15/2019

Towards Making the Most of BERT in Neural Machine Translation

GPT-2 and BERT demonstrate the effectiveness of using pre-trained langua...
research
04/16/2023

A Comprehensive Evaluation of the Copy Mechanism for Natural Language to SPARQL Query Generation

In recent years, the field of neural machine translation (NMT) for SPARQ...
research
04/07/2021

Better Neural Machine Translation by Extracting Linguistic Information from BERT

Adding linguistic information (syntax or semantics) to neural machine tr...
research
05/27/2023

Augmenting Large Language Model Translators via Translation Memories

Using translation memories (TMs) as prompts is a promising approach to i...
research
09/14/2022

Toward Improving Health Literacy in Patient Education Materials with Neural Machine Translation Models

Health literacy is the central focus of Healthy People 2030, the fifth i...
research
11/07/2019

The LIG system for the English-Czech Text Translation Task of IWSLT 2019

In this paper, we present our submission for the English to Czech Text T...
research
10/13/2020

Incorporating BERT into Parallel Sequence Decoding with Adapters

While large scale pre-trained language models such as BERT have achieved...

Please sign up or login with your details

Forgot password? Click here to reset