Removing Word-Level Spurious Alignment between Images and Pseudo-Captions in Unsupervised Image Captioning

04/28/2021
by   Ukyo Honda, et al.
0

Unsupervised image captioning is a challenging task that aims at generating captions without the supervision of image-sentence pairs, but only with images and sentences drawn from different sources and object labels detected from the images. In previous work, pseudo-captions, i.e., sentences that contain the detected object labels, were assigned to a given image. The focus of the previous work was on the alignment of input images and pseudo-captions at the sentence level. However, pseudo-captions contain many words that are irrelevant to a given image. In this work, we investigate the effect of removing mismatched words from image-sentence alignment to determine how they make this task difficult. We propose a simple gating mechanism that is trained to align image features with only the most reliable words in pseudo-captions: the detected object labels. The experimental results show that our proposed method outperforms the previous methods without introducing complex sentence-level learning objectives. Combined with the sentence-level alignment method of previous work, our method further improves its performance. These results confirm the importance of careful alignment in word-level details.

READ FULL TEXT
research
10/06/2017

Contrastive Learning for Image Captioning

Image captioning, a popular topic in computer vision, has achieved subst...
research
10/31/2019

Hidden State Guidance: Improving Image Captioning using An Image Conditioned Autoencoder

Most RNN-based image captioning models receive supervision on the output...
research
06/04/2022

Automated Audio Captioning with Epochal Difficult Captions for Curriculum Learning

In this paper, we propose an algorithm, Epochal Difficult Captions, to s...
research
07/27/2020

Decomposed Generation Networks with Structure Prediction for Recipe Generation from Food Images

Recipe generation from food images and ingredients is a challenging task...
research
08/25/2019

Towards Unsupervised Image Captioning with Shared Multimodal Embeddings

Understanding images without explicit supervision has become an importan...
research
05/03/2016

Improving Image Captioning by Concept-based Sentence Reranking

This paper describes our winning entry in the ImageCLEF 2015 image sente...
research
12/06/2022

Switching to Discriminative Image Captioning by Relieving a Bottleneck of Reinforcement Learning

Discriminativeness is a desirable feature of image captions: captions sh...

Please sign up or login with your details

Forgot password? Click here to reset