A Novel Graph-based Multi-modal Fusion Encoder for Neural Machine Translation

07/17/2020
by   Yongjing Yin, et al.
0

Multi-modal neural machine translation (NMT) aims to translate source sentences into a target language paired with images. However, dominant multi-modal NMT models do not fully exploit fine-grained semantic correspondences between semantic units of different modalities, which have potential to refine multi-modal representation learning. To deal with this issue, in this paper, we propose a novel graph-based multi-modal fusion encoder for NMT. Specifically, we first represent the input sentence and image using a unified multi-modal graph, which captures various semantic relationships between multi-modal semantic units (words and visual objects). We then stack multiple graph-based multi-modal fusion layers that iteratively perform semantic interactions to learn node representations. Finally, these representations provide an attention-based context vector for the decoder. We evaluate our proposed encoder on the Multi30K datasets. Experimental results and in-depth analysis show the superiority of our multi-modal NMT model.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/28/2018

Unsupervised Multi-modal Neural Machine Translation

Unsupervised neural machine translation (UNMT) has recently achieved rem...
research
03/16/2021

Gumbel-Attention for Multi-modal Machine Translation

Multi-modal machine translation (MMT) improves translation quality by in...
research
01/23/2017

Incorporating Global Visual Features into Attention-Based Neural Machine Translation

We introduce multi-modal, attention-based neural machine translation (NM...
research
02/03/2017

Multilingual Multi-modal Embeddings for Natural Language Processing

We propose a novel discriminative model that learns embeddings from mult...
research
04/20/2019

Multi-modal gated recurrent units for image description

Using a natural language sentence to describe the content of an image is...
research
05/19/2022

Support-set based Multi-modal Representation Enhancement for Video Captioning

Video captioning is a challenging task that necessitates a thorough comp...
research
05/24/2023

Collaborative Recommendation Model Based on Multi-modal Multi-view Attention Network: Movie and literature cases

The existing collaborative recommendation models that use multi-modal in...

Please sign up or login with your details

Forgot password? Click here to reset