Controlling for Stereotypes in Multimodal Language Model Evaluation

02/03/2023
by   Manuj Malik, et al.
0

We propose a methodology and design two benchmark sets for measuring to what extent language-and-vision language models use the visual signal in the presence or absence of stereotypes. The first benchmark is designed to test for stereotypical colors of common objects, while the second benchmark considers gender stereotypes. The key idea is to compare predictions when the image conforms to the stereotype to predictions when it does not. Our results show that there is significant variation among multimodal models: the recent Transformer-based FLAVA seems to be more sensitive to the choice of image and less affected by stereotypes than older CNN-based models such as VisualBERT and LXMERT. This effect is more discernible in this type of controlled setting than in traditional evaluations where we do not know whether the model relied on the stereotype or the visual signal.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/14/2023

Are words equally surprising in audio and audio-visual comprehension?

We report a controlled study investigating the effect of visual informat...
research
06/18/2021

GEM: A General Evaluation Benchmark for Multimodal Tasks

In this paper, we present GEM as a General Evaluation benchmark for Mult...
research
09/23/2021

Transferring Knowledge from Vision to Language: How to Achieve it and how to Measure it?

Large language models are known to suffer from the hallucination problem...
research
06/25/2021

Multimodal Few-Shot Learning with Frozen Language Models

When trained at sufficient scale, auto-regressive language models exhibi...
research
10/15/2021

The World of an Octopus: How Reporting Bias Influences a Language Model's Perception of Color

Recent work has raised concerns about the inherent limitations of text-o...
research
03/16/2023

MultiModal Bias: Introducing a Framework for Stereotypical Bias Assessment beyond Gender and Race in Vision Language Models

Recent breakthroughs in self supervised training have led to a new class...
research
08/31/2018

Nightmare at test time: How punctuation prevents parsers from generalizing

Punctuation is a strong indicator of syntactic structure, and parsers tr...

Please sign up or login with your details

Forgot password? Click here to reset