Context-Aware Group Captioning via Self-Attention and Contrastive Features

04/07/2020
by   Zhuowan Li, et al.
1

While image captioning has progressed rapidly, existing works focus mainly on describing single images. In this paper, we introduce a new task, context-aware group captioning, which aims to describe a group of target images in the context of another group of related reference images. Context-aware group captioning requires not only summarizing information from both the target and reference image group but also contrasting between them. To solve this problem, we propose a framework combining self-attention mechanism with contrastive feature construction to effectively summarize common information from each image group while capturing discriminative information between them. To build the dataset for this task, we propose to group the images and generate the group captions based on single image captions using scene graphs matching. Our datasets are constructed on top of the public Conceptual Captions dataset and our new Stock Captions dataset. Experiments on the two datasets show the effectiveness of our method on this new task. Related Datasets and code are released at https://lizw14.github.io/project/groupcap .

READ FULL TEXT

page 1

page 7

page 8

page 15

page 16

page 17

page 18

page 19

research
01/11/2017

Context-aware Captions from Context-agnostic Supervision

We introduce an inference technique to produce discriminative context-aw...
research
02/04/2023

Transform, Contrast and Tell: Coherent Entity-Aware Multi-Image Captioning

Coherent entity-aware multi-image captioning aims to generate coherent c...
research
07/22/2022

Rethinking the Reference-based Distinctive Image Captioning

Distinctive Image Captioning (DIC) – generating distinctive captions tha...
research
06/15/2023

Pragmatic Inference with a CLIP Listener for Contrastive Captioning

We propose a simple yet effective and robust method for contrastive capt...
research
10/16/2021

Self-Annotated Training for Controllable Image Captioning

The Controllable Image Captioning (CIC) task aims to generate captions c...
research
02/03/2021

L2C: Describing Visual Differences Needs Semantic Understanding of Individuals

Recent advances in language and vision push forward the research of capt...
research
07/19/2022

Relational Future Captioning Model for Explaining Likely Collisions in Daily Tasks

Domestic service robots that support daily tasks are a promising solutio...

Please sign up or login with your details

Forgot password? Click here to reset