Domain-robust VQA with diverse datasets and methods but no target labels

03/29/2021
by   Mingda Zhang, et al.
0

The observation that computer vision methods overfit to dataset specifics has inspired diverse attempts to make object recognition models robust to domain shifts. However, similar work on domain-robust visual question answering methods is very limited. Domain adaptation for VQA differs from adaptation for object recognition due to additional complexity: VQA models handle multimodal inputs, methods contain multiple steps with diverse modules resulting in complex optimization, and answer spaces in different datasets are vastly different. To tackle these challenges, we first quantify domain shifts between popular VQA datasets, in both visual and textual space. To disentangle shifts between datasets arising from different modalities, we also construct synthetic shifts in the image and question domains separately. Second, we test the robustness of different families of VQA methods (classic two-stream, transformer, and neuro-symbolic methods) to these shifts. Third, we test the applicability of existing domain adaptation methods and devise a new one to bridge VQA domain gaps, adjusted to specific VQA models. To emulate the setting of real-world generalization, we focus on unsupervised domain adaptation and the open-ended classification task formulation.

READ FULL TEXT

page 1

page 5

page 6

research
11/11/2019

Open-Ended Visual Question Answering by Multi-Modal Domain Adaptation

We study the problem of visual question answering (VQA) in images by exp...
research
02/13/2023

In Search for a Generalizable Method for Source Free Domain Adaptation

Source-free domain adaptation (SFDA) is compelling because it allows ada...
research
12/01/2022

Super-CLEVR: A Virtual Benchmark to Diagnose Domain Robustness in Visual Reasoning

Visual Question Answering (VQA) models often perform poorly on out-of-di...
research
06/30/2023

Multimodal Prompt Retrieval for Generative Visual Question Answering

Recent years have witnessed impressive results of pre-trained vision-lan...
research
01/24/2022

Question Generation for Evaluating Cross-Dataset Shifts in Multi-modal Grounding

Visual question answering (VQA) is the multi-modal task of answering nat...
research
04/05/2019

Actively Seeking and Learning from Live Data

One of the key limitations of traditional machine learning methods is th...
research
12/20/2022

To Adapt or to Annotate: Challenges and Interventions for Domain Adaptation in Open-Domain Question Answering

Recent advances in open-domain question answering (ODQA) have demonstrat...

Please sign up or login with your details

Forgot password? Click here to reset