Multi-Granularity Self-Attention for Neural Machine Translation

09/05/2019
by   Jie Hao, et al.
0

Current state-of-the-art neural machine translation (NMT) uses a deep multi-head self-attention network with no explicit phrase information. However, prior work on statistical machine translation has shown that extending the basic translation unit from words to phrases has produced substantial improvements, suggesting the possibility of improving NMT performance from explicit modeling of phrases. In this work, we present multi-granularity self-attention (Mg-Sa): a neural network that combines multi-head self-attention and phrase modeling. Specifically, we train several attention heads to attend to phrases in either n-gram or syntactic formalism. Moreover, we exploit interactions among phrases to enhance the strength of structure modeling - a commonly-cited weakness of self-attention. Experimental results on WMT14 English-to-German and NIST Chinese-to-English translation tasks show the proposed approach consistently improves performance. Targeted linguistic analysis reveals that Mg-Sa indeed captures useful phrase information at various levels of granularities.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/17/2017

Towards Neural Phrase-based Machine Translation

In this paper, we present Neural Phrase-based Machine Translation (NPMT)...
research
09/05/2022

Continuous Decomposition of Granularity for Neural Paraphrase Generation

While Transformers have had significant success in paragraph generation,...
research
06/05/2019

From Balustrades to Pierre Vinken: Looking for Syntax in Transformer Self-Attentions

We inspect the multi-head self-attention in Transformer NMT encoders for...
research
05/25/2016

BattRAE: Bidimensional Attention-Based Recursive Autoencoders for Learning Bilingual Phrase Embeddings

In this paper, we propose a bidimensional attention based recursive auto...
research
06/10/2021

Progressive Multi-Granularity Training for Non-Autoregressive Translation

Non-autoregressive translation (NAT) significantly accelerates the infer...
research
04/23/2018

Linguistically-Informed Self-Attention for Semantic Role Labeling

The current state-of-the-art end-to-end semantic role labeling (SRL) mod...
research
06/25/2020

Learning Source Phrase Representations for Neural Machine Translation

The Transformer translation model (Vaswani et al., 2017) based on a mult...

Please sign up or login with your details

Forgot password? Click here to reset