Fair Hate Speech Detection through Evaluation of Social Group Counterfactuals

10/24/2020
by   Aida Mostafazadeh Davani, et al.
0

Approaches for mitigating bias in supervised models are designed to reduce models' dependence on specific sensitive features of the input data, e.g., mentioned social groups. However, in the case of hate speech detection, it is not always desirable to equalize the effects of social groups because of their essential role in distinguishing outgroup-derogatory hate, such that particular types of hateful rhetoric carry the intended meaning only when contextualized around certain social group tokens. Counterfactual token fairness for a mentioned social group evaluates the model's predictions as to whether they are the same for (a) the actual sentence and (b) a counterfactual instance, which is generated by changing the mentioned social group in the sentence. Our approach assures robust model predictions for counterfactuals that imply similar meaning as the actual sentence. To quantify the similarity of a sentence and its counterfactual, we compare their likelihood score calculated by generative language models. By equalizing model behaviors on each sentence and its counterfactuals, we mitigate bias in the proposed model while preserving the overall classification performance.

READ FULL TEXT
research
08/03/2021

Improving Counterfactual Generation for Fair Hate Speech Detection

Bias mitigation approaches reduce models' dependence on sensitive featur...
research
11/08/2019

Reducing Sentiment Bias in Language Models via Counterfactual Evaluation

Recent improvements in large-scale language models have driven progress ...
research
02/08/2022

Counterfactual Multi-Token Fairness in Text Classification

The counterfactual token generation has been limited to perturbing only ...
research
09/27/2018

Counterfactual Fairness in Text Classification through Robustness

In this paper, we study counterfactual fairness in text classification, ...
research
10/13/2022

Walk a Mile in Their Shoes: a New Fairness Criterion for Machine Learning

The old empathetic adage, “Walk a mile in their shoes,” asks that one im...
research
06/02/2020

What's Sex Got To Do With Fair Machine Learning?

Debate about fairness in machine learning has largely centered around co...
research
02/22/2023

Uncovering Bias in Face Generation Models

Recent advancements in GANs and diffusion models have enabled the creati...

Please sign up or login with your details

Forgot password? Click here to reset