AGQA 2.0: An Updated Benchmark for Compositional Spatio-Temporal Reasoning

Prior benchmarks have analyzed models' answers to questions about videos in order to measure visual compositional reasoning. Action Genome Question Answering (AGQA) is one such benchmark. AGQA provides a training/test split with balanced answer distributions to reduce the effect of linguistic biases. However, some biases remain in several AGQA categories. We introduce AGQA 2.0, a version of this benchmark with several improvements, most namely a stricter balancing procedure. We then report results on the updated benchmark for all experiments.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/30/2021

AGQA: A Benchmark for Compositional Spatio-Temporal Reasoning

Visual events are a composition of temporal actions involving actors spa...
research
04/14/2022

Measuring Compositional Consistency for Video Question Answering

Recent video question answering benchmarks indicate that state-of-the-ar...
research
05/04/2023

ANetQA: A Large-scale Benchmark for Fine-grained Compositional Reasoning over Untrimmed Videos

Building benchmarks to systemically analyze different capabilities of vi...
research
12/20/2016

CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning

When building artificial intelligence systems that can reason and answer...
research
05/15/2021

Show Why the Answer is Correct! Towards Explainable AI using Compositional Temporal Attention

Visual Question Answering (VQA) models have achieved significant success...
research
10/19/2022

Dense but Efficient VideoQA for Intricate Compositional Reasoning

It is well known that most of the conventional video question answering ...
research
03/18/2023

Divide and Conquer: Answering Questions with Object Factorization and Compositional Reasoning

Humans have the innate capability to answer diverse questions, which is ...

Please sign up or login with your details

Forgot password? Click here to reset