Joint Source-Target Self Attention with Locality Constraints

05/16/2019
by   José A. R. Fonollosa, et al.
0

The dominant neural machine translation models are based on the encoder-decoder structure, and many of them rely on an unconstrained receptive field over source and target sequences. In this paper we study a new architecture that breaks with both conventions. Our simplified architecture consists in the decoder part of a transformer model, based on self-attention, but with locality constraints applied on the attention receptive field. As input for training, both source and target sentences are fed to the network, which is trained as a language model. At inference time, the target tokens are predicted autoregressively starting with the source sequence as previous tokens. The proposed model achieves a new state of the art of 35.7 BLEU on IWSLT'14 German-English and matches the best reported results in the literature on the WMT'14 English-German and WMT'14 English-French translation benchmarks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/06/2019

Improving Neural Machine Translation with Parent-Scaled Self-Attention

Most neural machine translation (NMT) models operate on source and targe...
research
10/31/2016

Neural Machine Translation in Linear Time

We present a novel neural network for processing sequences. The ByteNet ...
research
06/12/2017

Attention Is All You Need

The dominant sequence transduction models are based on complex recurrent...
research
11/02/2020

Focus on the present: a regularization method for the ASR source-target attention layer

This paper introduces a novel method to diagnose the source-target atten...
research
12/20/2022

Receptive Field Alignment Enables Transformer Length Extrapolation

Length extrapolation is a desirable property that permits training a tra...
research
05/17/2018

Cross-Target Stance Classification with Self-Attention Networks

In stance classification, the target on which the stance is made defines...
research
04/05/2019

Modeling Recurrence for Transformer

Recently, the Transformer model that is based solely on attention mechan...

Please sign up or login with your details

Forgot password? Click here to reset