Good, Better, Best: Textual Distractors Generation for Multi-Choice VQA via Policy Gradient

by   Jiaying Lu, et al.

Textual distractors in current multi-choice VQA datasets are not challenging enough for state-of-the-art neural models. To better assess whether well-trained VQA models are vulnerable to potential attack such as more challenging distractors, we introduce a novel task called textual Distractors Generation for VQA (DG-VQA). The goal of DG-VQA is to generate the most confusing distractors in multi-choice VQA tasks represented as a tuple of image, question, and the correct answer. Consequently, such distractors expose the vulnerability of neural models. We show that distractor generation can be formulated as a Markov Decision Process, and present a reinforcement learning solution to unsupervised produce distractors. Our solution addresses the lack of large annotated corpus issue in classical distractor generation methods. Our proposed model receives reward signals from well-trained multi-choice VQA models and updates its parameters via policy gradient. The empirical results show that the generated textual distractors can successfully confuse several cutting-edge models with an average 20 Furthermore, we conduct extra adversarial training to improve the robustness of VQA models by incorporating the generated distractors. The experiment validates the effectiveness of adversarial training by showing a performance improvement of 27


page 1

page 7


All You May Need for VQA are Image Captions

Visual Question Answering (VQA) has benefited from increasingly sophisti...

Adversarial VQA: A New Benchmark for Evaluating the Robustness of VQA Models

With large-scale pre-training, the past two years have witnessed signifi...

Inverse Visual Question Answering: A New Benchmark and VQA Diagnosis Tool

In recent years, visual question answering (VQA) has become topical. The...

ICDAR 2021 Competition on Document VisualQuestion Answering

In this report we present results of the ICDAR 2021 edition of the Docum...

TAG: Boosting Text-VQA via Text-aware Visual Question-answer Generation

Text-VQA aims at answering questions that require understanding the text...

A Transformer-based Cross-modal Fusion Model with Adversarial Training for VQA Challenge 2021

In this paper, inspired by the successes of visionlanguage pre-trained m...

Please sign up or login with your details

Forgot password? Click here to reset