ViTA: Visual-Linguistic Translation by Aligning Object Tags

06/01/2021
by   Kshitij Gupta, et al.
0

Multimodal Machine Translation (MMT) enriches the source text with visual information for translation. It has gained popularity in recent years, and several pipelines have been proposed in the same direction. Yet, the task lacks quality datasets to illustrate the contribution of visual modality in the translation systems. In this paper, we propose our system for the Multimodal Translation Task of WAT 2021 from English to Hindi. We propose to use mBART, a pretrained multilingual sequence-to-sequence model, for the textual-only translations. Further, we bring the visual information to a textual domain by extracting object tags from the image and enhance the input for the multimodal task. We also explore the robustness of our system by systematically degrading the source text. Finally, we achieve a BLEU score of 44.6 and 51.6 on the test set and challenge set of the task.

READ FULL TEXT
research
03/20/2019

Probing the Need for Visual Context in Multimodal Machine Translation

Current work on multimodal machine translation (MMT) has suggested that ...
research
08/02/2022

Silo NLP's Participation at WAT2022

This paper provides the system description of "Silo NLP's" submission to...
research
05/20/2023

Scene Graph as Pivoting: Inference-time Image-free Unsupervised Multimodal Machine Translation with Visual Scene Hallucination

In this work, we investigate a more realistic unsupervised multimodal ma...
research
12/09/2017

Modulating and attending the source image during encoding improves Multimodal Translation

We propose a new and fully end-to-end approach for multimodal translatio...
research
07/21/2022

Unimodal vs. Multimodal Siamese Networks for Outfit Completion

The popularity of online fashion shopping continues to grow. The ability...
research
12/20/2022

Beyond Triplet: Leveraging the Most Data for Multimodal Machine Translation

Multimodal machine translation (MMT) aims to improve translation quality...
research
08/05/2019

Predicting Actions to Help Predict Translations

We address the task of text translation on the How2 dataset using a stat...

Please sign up or login with your details

Forgot password? Click here to reset