VQS: Linking Segmentations to Questions and Answers for Supervised Attention in VQA and Question-Focused Semantic Segmentation

08/15/2017
by   Chuang Gan, et al.
0

Rich and dense human labeled datasets are among the main enabling factors for the recent advance on vision-language understanding. Many seemingly distant annotations (e.g., semantic segmentation and visual question answering (VQA)) are inherently connected in that they reveal different levels and perspectives of human understandings about the same visual scenes --- and even the same set of images (e.g., of COCO). The popularity of COCO correlates those annotations and tasks. Explicitly linking them up may significantly benefit both individual tasks and the unified vision and language modeling. We present the preliminary work of linking the instance segmentations provided by COCO to the questions and answers (QAs) in the VQA dataset, and name the collected links visual questions and segmentation answers (VQS). They transfer human supervision between the previously separate tasks, offer more effective leverage to existing problems, and also open the door for new research problems and models. We study two applications of the VQS data in this paper: supervised attention for VQA and a novel question-focused semantic segmentation task. For the former, we obtain state-of-the-art results on the VQA real multiple-choice task by simply augmenting the multilayer perceptrons with some attention features that are learned using the segmentation-QA links as explicit supervision. To put the latter in perspective, we study two plausible methods and compare them to an oracle method assuming that the instance segmentations are given at the test stage.

READ FULL TEXT

page 1

page 3

page 5

page 7

page 11

page 12

research
10/20/2016

Proposing Plausible Answers for Open-ended Visual Question Answering

Answering open-ended questions is an essential capability for any intell...
research
09/21/2016

The Color of the Cat is Gray: 1 Million Full-Sentences Visual Question Answering (FSVQA)

Visual Question Answering (VQA) task has showcased a new stage of intera...
research
01/31/2020

Augmenting Visual Question Answering with Semantic Frame Information in a Multitask Learning Approach

Visual Question Answering (VQA) concerns providing answers to Natural La...
research
04/06/2016

A Focused Dynamic Attention Model for Visual Question Answering

Visual Question and Answering (VQA) problems are attracting increasing i...
research
04/04/2020

Generating Rationales in Visual Question Answering

Despite recent advances in Visual QuestionAnswering (VQA), it remains a ...
research
11/30/2019

A Free Lunch in Generating Datasets: Building a VQG and VQA System with Attention and Humans in the Loop

Despite their importance in training artificial intelligence systems, la...
research
03/31/2022

SimVQA: Exploring Simulated Environments for Visual Question Answering

Existing work on VQA explores data augmentation to achieve better genera...

Please sign up or login with your details

Forgot password? Click here to reset