Knowing Where to Look? Analysis on Attention of Visual Question Answering System

10/09/2018
by   Wei Li, et al.
0

Attention mechanisms have been widely used in Visual Question Answering (VQA) solutions due to their capacity to model deep cross-domain interactions. Analyzing attention maps offers us a perspective to find out limitations of current VQA systems and an opportunity to further improve them. In this paper, we select two state-of-the-art VQA approaches with attention mechanisms to study their robustness and disadvantages by visualizing and analyzing their estimated attention maps. We find that both methods are sensitive to features, and simultaneously, they perform badly for counting and multi-object related questions. We believe that the findings and analytical method will help researchers identify crucial challenges on the way to improve their own VQA systems.

READ FULL TEXT

page 4

page 6

page 7

research
06/17/2016

Human Attention in Visual Question Answering: Do Humans and Deep Networks Look at the Same Regions?

We conduct large-scale studies on `human attention' in Visual Question A...
research
09/21/2020

Regularizing Attention Networks for Anomaly Detection in Visual Question Answering

For stability and reliability of real-world applications, the robustness...
research
04/13/2021

Neuro-Symbolic VQA: A review from the perspective of AGI desiderata

An ultimate goal of the AI and ML fields is artificial general intellige...
research
09/17/2019

Inverse Visual Question Answering with Multi-Level Attentions

In this paper, we propose a novel deep multi-level attention model to ad...
research
01/11/2022

On the Efficacy of Co-Attention Transformer Layers in Visual Question Answering

In recent years, multi-modal transformers have shown significant progres...
research
01/24/2018

Structured Triplet Learning with POS-tag Guided Attention for Visual Question Answering

Visual question answering (VQA) is of significant interest due to its po...
research
06/08/2021

Are VQA Systems RAD? Measuring Robustness to Augmented Data with Focused Interventions

Deep learning algorithms have shown promising results in visual question...

Please sign up or login with your details

Forgot password? Click here to reset