ROME: Testing Image Captioning Systems via Recursive Object Melting

by   Boxi Yu, et al.

Image captioning (IC) systems aim to generate a text description of the salient objects in an image. In recent years, IC systems have been increasingly integrated into our daily lives, such as assistance for visually-impaired people and description generation in Microsoft Powerpoint. However, even the cutting-edge IC systems (e.g., Microsoft Azure Cognitive Services) and algorithms (e.g., OFA) could produce erroneous captions, leading to incorrect captioning of important objects, misunderstanding, and threats to personal safety. The existing testing approaches either fail to handle the complex form of IC system output (i.e., sentences in natural language) or generate unnatural images as test cases. To address these problems, we introduce Recursive Object MElting (Rome), a novel metamorphic testing approach for validating IC systems. Different from existing approaches that generate test cases by inserting objects, which easily make the generated images unnatural, Rome melts (i.e., remove and inpaint) objects. Rome assumes that the object set in the caption of an image includes the object set in the caption of a generated image after object melting. Given an image, Rome can recursively remove its objects to generate different pairs of images. We use Rome to test one widely-adopted image captioning API and four state-of-the-art (SOTA) algorithms. The results show that the test cases generated by Rome look much more natural than the SOTA IC testing approach and they achieve comparable naturalness to the original images. Meanwhile, by generating test pairs using 226 seed images, Rome reports a total of 9,121 erroneous issues with high precision (86.47 addition, we further utilize the test cases generated by Rome to retrain the Oscar, which improves its performance across multiple evaluation metrics.


page 1

page 3

page 4

page 6

page 8

page 10

page 11


Automated Testing of Image Captioning Systems

Image captioning (IC) systems, which automatically generate a text descr...

A Baseline for Detecting Out-of-Distribution Examples in Image Captioning

Image captioning research achieved breakthroughs in recent years by deve...

OBJ2TEXT: Generating Visually Descriptive Language from Object Layouts

Generating captions for images is a task that has recently received cons...

Towards Unique and Informative Captioning of Images

Despite considerable progress, state of the art image captioning models ...

Image Captioning with Unseen Objects

Image caption generation is a long standing and challenging problem at t...

Multimodal Image Captioning for Marketing Analysis

Automatically captioning images with natural language sentences is an im...

Captioning Images with Novel Objects via Online Vocabulary Expansion

In this study, we introduce a low cost method for generating description...

Please sign up or login with your details

Forgot password? Click here to reset