Switching to Discriminative Image Captioning by Relieving a Bottleneck of Reinforcement Learning

by   Ukyo Honda, et al.

Discriminativeness is a desirable feature of image captions: captions should describe the characteristic details of input images. However, recent high-performing captioning models, which are trained with reinforcement learning (RL), tend to generate overly generic captions despite their high performance in various other criteria. First, we investigate the cause of the unexpectedly low discriminativeness and show that RL has a deeply rooted side effect of limiting the output words to high-frequency words. The limited vocabulary is a severe bottleneck for discriminativeness as it is difficult for a model to describe the details beyond its vocabulary. Then, based on this identification of the bottleneck, we drastically recast discriminative image captioning as a much simpler task of encouraging low-frequency word generation. Hinted by long-tail classification and debiasing methods, we propose methods that easily switch off-the-shelf RL models to discriminativeness-aware models with only a single-epoch fine-tuning on the part of the parameters. Extensive experiments demonstrate that our methods significantly enhance the discriminativeness of off-the-shelf RL models and even outperform previous discriminativeness-aware methods with much smaller computational costs. Detailed analysis and human evaluation also verify that our methods boost the discriminativeness without sacrificing the overall quality of captions.


page 1

page 12

page 13

page 14


IC^3: Image Captioning by Committee Consensus

If you ask a human to describe an image, they might do so in a thousand ...

Show, Tell and Discriminate: Image Captioning by Self-retrieval with Partially Labeled Data

The aim of image captioning is to generate similar captions by machine a...

Towards Diverse and Accurate Image Captions via Reinforcing Determinantal Point Process

Although significant progress has been made in the field of automatic im...

SubICap: Towards Subword-informed Image Captioning

Existing Image Captioning (IC) systems model words as atomic units in ca...

Fine-Grained Image Captioning with Global-Local Discriminative Objective

Significant progress has been made in recent years in image captioning, ...

Removing Word-Level Spurious Alignment between Images and Pseudo-Captions in Unsupervised Image Captioning

Unsupervised image captioning is a challenging task that aims at generat...

Controllable Image Captioning via Prompting

Despite the remarkable progress of image captioning, existing captioners...

Please sign up or login with your details

Forgot password? Click here to reset