VQA-LOL: Visual Question Answering under the Lens of Logic

by   Tejas Gokhale, et al.

Logical connectives and their implications on the meaning of a natural language sentence are a fundamental aspect of understanding. In this paper, we investigate visual question answering (VQA) through the lens of logical transformation and posit that systems that seek to answer questions about images must be robust to these transformations of the question. If a VQA system is able to answer a question, it should also be able to answer the logical composition of questions. We analyze the performance of state-of-the-art models on the VQA task under these logical operations and show that they have difficulty in correctly answering such questions. We then construct an augmentation of the VQA dataset with questions containing logical operations and retrain the same models to establish a baseline. We further propose a novel methodology to train models to learn negation, conjunction, and disjunction and show improvement in learning logical composition and retaining performance on VQA. We suggest this work as a move towards embedding logical connectives in visual understanding, along with the benefits of robustness and generalizability. Our code and dataset is available online at https://www.public.asu.edu/ tgokhale/vqa_lol.html


page 1

page 3

page 5

page 7

page 10

page 11

page 12

page 14


IQ-VQA: Intelligent Visual Question Answering

Even though there has been tremendous progress in the field of Visual Qu...

Logical Implications for Visual Question Answering Consistency

Despite considerable recent progress in Visual Question Answering (VQA) ...

The Reversal Curse: LLMs trained on "A is B" fail to learn "B is A"

We expose a surprising failure of generalization in auto-regressive larg...

Semantic Equivalent Adversarial Data Augmentation for Visual Question Answering

Visual Question Answering (VQA) has achieved great success thanks to the...

Tutorial on Answering Questions about Images with Deep Learning

Together with the development of more accurate methods in Computer Visio...

Which visual questions are difficult to answer? Analysis with Entropy of Answer Distributions

We propose a novel approach to identify the difficulty of visual questio...

Visual Question Answering based on Formal Logic

Visual question answering (VQA) has been gaining a lot of traction in th...

Please sign up or login with your details

Forgot password? Click here to reset