Assessing the Bilingual Knowledge Learned by Neural Machine Translation Models

04/28/2020
by   Shilin He, et al.
0

Machine translation (MT) systems translate text between different languages by automatically learning in-depth knowledge of bilingual lexicons, grammar and semantics from the training examples. Although neural machine translation (NMT) has led the field of MT, we have a poor understanding on how and why it works. In this paper, we bridge the gap by assessing the bilingual knowledge learned by NMT models with phrase table – an interpretable table of bilingual lexicons. We extract the phrase table from the training examples that an NMT model correctly predicts. Extensive experiments on widely-used datasets show that the phrase table is reasonable and consistent against language pairs and random seeds. Equipped with the interpretable phrase table, we find that NMT models learn patterns from simple to complex and distill essential bilingual knowledge from the training examples. We also revisit some advances that potentially affect the learning of bilingual knowledge (e.g., back-translation), and report some interesting findings. We believe this work opens a new angle to interpret NMT with statistic models, and provides empirical supports for recent advances in improving NMT models.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/10/2017

Neural Machine Translation Leveraging Phrase-based Models in a Hybrid Search

In this paper, we introduce a hybrid search for attention-based neural m...
research
05/25/2018

Phrase Table as Recommendation Memory for Neural Machine Translation

Neural Machine Translation (NMT) has drawn much attention due to its pro...
research
04/13/2020

Neural Machine Translation: Challenges, Progress and Future

Machine translation (MT) is a technique that leverages computers to tran...
research
11/26/2019

Neural Machine Translation with Explicit Phrase Alignment

While neural machine translation (NMT) has achieved state-of-the-art tra...
research
10/06/2020

Data Rejuvenation: Exploiting Inactive Training Examples for Neural Machine Translation

Large-scale training datasets lie at the core of the recent success of n...
research
03/14/2021

Crowdsourced Phrase-Based Tokenization for Low-Resourced Neural Machine Translation: The Case of Fon Language

Building effective neural machine translation (NMT) models for very low-...
research
04/05/2020

Understanding Learning Dynamics for Neural Machine Translation

Despite the great success of NMT, there still remains a severe challenge...

Please sign up or login with your details

Forgot password? Click here to reset