Intrinsic Bias Metrics Do Not Correlate with Application Bias

Natural Language Processing (NLP) systems learn harmful societal biases that cause them to widely proliferate inequality as they are deployed in more and more situations. To address and combat this, the NLP community relies on a variety of metrics to identify and quantify bias in black-box models and to guide efforts at debiasing. Some of these metrics are intrinsic, and are measured in word embedding spaces, and some are extrinsic, which measure the bias present downstream in the tasks that the word embeddings are plugged into. This research examines whether easy-to-measure intrinsic metrics correlate well to real world extrinsic metrics. We measure both intrinsic and extrinsic bias across hundreds of trained models covering different tasks and experimental conditions and find that there is no reliable correlation between these metrics that holds in all scenarios across tasks and languages. We advise that efforts to debias embedding spaces be always also paired with measurement of downstream model bias, and suggest that that community increase effort into making downstream measurement more feasible via creation of additional challenge sets and annotated test data. We additionally release code, a new intrinsic metric, and an annotated test set for gender bias for hatespeech.


page 1

page 2

page 3

page 4


Evaluating Word Embedding Models: Methods and Experimental Results

Extensive evaluation on a large number of word embedding models for lang...

How Gender Debiasing Affects Internal Model Representations, and Why It Matters

Common studies of gender bias in NLP focus either on extrinsic bias meas...

Marked Attribute Bias in Natural Language Inference

Reporting and providing test sets for harmful bias in NLP applications i...

Evaluating Bias In Dutch Word Embeddings

Recent research in Natural Language Processing has revealed that word em...

Choose Your Lenses: Flaws in Gender Bias Evaluation

Considerable efforts to measure and mitigate gender bias in recent years...

Considerations for the Interpretation of Bias Measures of Word Embeddings

Word embedding spaces are powerful tools for capturing latent semantic r...

DebIE: A Platform for Implicit and Explicit Debiasing of Word Embedding Spaces

Recent research efforts in NLP have demonstrated that distributional wor...

Please sign up or login with your details

Forgot password? Click here to reset