With a Little Help from your own Past: Prototypical Memory Networks for Image Captioning

08/23/2023
by   Manuele Barraco, et al.
0

Image captioning, like many tasks involving vision and language, currently relies on Transformer-based architectures for extracting the semantics in an image and translating it into linguistically coherent descriptions. Although successful, the attention operator only considers a weighted summation of projections of the current input sample, therefore ignoring the relevant semantic information which can come from the joint observation of other samples. In this paper, we devise a network which can perform attention over activations obtained while processing other training samples, through a prototypical memory model. Our memory models the distribution of past keys and values through the definition of prototype vectors which are both discriminative and compact. Experimentally, we assess the performance of the proposed model on the COCO dataset, in comparison with carefully designed baselines and state-of-the-art approaches, and by investigating the role of each of the proposed components. We demonstrate that our proposal can increase the performance of an encoder-decoder Transformer by 3.7 CIDEr points both when training in cross-entropy only and when fine-tuning with self-critical sequence training. Source code and trained models are available at: https://github.com/aimagelab/PMA-Net.

READ FULL TEXT

page 3

page 12

page 13

page 14

research
02/11/2022

ACORT: A Compact Object Relation Transformer for Parameter Efficient Image Captioning

Recent research that applies Transformer-based architectures to image ca...
research
12/06/2022

Semantic-Conditional Diffusion Networks for Image Captioning

Recent advances on text-to-image generation have witnessed the rise of d...
research
12/17/2019

M^2: Meshed-Memory Transformer for Image Captioning

Transformer-based architectures represent the state of the art in sequen...
research
02/21/2022

CaMEL: Mean Teacher Learning for Image Captioning

Describing images in natural language is a fundamental step towards the ...
research
10/07/2021

End-to-End Supermask Pruning: Learning to Prune Image Captioning Models

With the advancement of deep models, research work on image captioning h...
research
05/20/2023

A request for clarity over the End of Sequence token in the Self-Critical Sequence Training

The Image Captioning research field is currently compromised by the lack...
research
01/06/2022

Compact Bidirectional Transformer for Image Captioning

Most current image captioning models typically generate captions from le...

Please sign up or login with your details

Forgot password? Click here to reset