The Wisdom of MaSSeS: Majority, Subjectivity, and Semantic Similarity in the Evaluation of VQA

09/12/2018
by   Shailza Jolly, et al.
0

We introduce MASSES, a simple evaluation metric for the task of Visual Question Answering (VQA). In its standard form, the VQA task is operationalized as follows: Given an image and an open-ended question in natural language, systems are required to provide a suitable answer. Currently, model performance is evaluated by means of a somehow simplistic metric: If the predicted answer is chosen by at least 3 human annotators out of 10, then it is 100 Though intuitively valuable, this metric has some important limitations. First, it ignores whether the predicted answer is the one selected by the Majority (MA) of annotators. Second, it does not account for the quantitative Subjectivity (S) of the answers in the sample (and dataset). Third, information about the Semantic Similarity (SES) of the responses is completely neglected. Based on such limitations, we propose a multi-component metric that accounts for all these issues. We show that our metric is effective in providing a more fine-grained evaluation both on the quantitative and qualitative level.

READ FULL TEXT

page 1

page 2

page 4

page 6

page 8

research
10/24/2020

Beyond VQA: Generating Multi-word Answer and Rationale to Visual Questions

Visual Question Answering is a multi-modal task that aims to measure hig...
research
04/28/2022

Reliable Visual Question Answering: Abstain Rather Than Answer Incorrectly

Machine learning has advanced dramatically, narrowing the accuracy gap t...
research
08/29/2016

Visual Question: Predicting If a Crowd Will Agree on the Answer

Visual question answering (VQA) systems are emerging from a desire to em...
research
01/31/2020

Augmenting Visual Question Answering with Semantic Frame Information in a Multitask Learning Approach

Visual Question Answering (VQA) concerns providing answers to Natural La...
research
06/10/2020

Estimating semantic structure for the VQA answer space

Since its appearance, Visual Question Answering (VQA, i.e. answering a q...
research
05/24/2023

Measuring Faithful and Plausible Visual Grounding in VQA

Metrics for Visual Grounding (VG) in Visual Question Answering (VQA) sys...

Please sign up or login with your details

Forgot password? Click here to reset