Are fairness metric scores enough to assess discrimination biases in machine learning?

by   Fanny Jourdan, et al.

This paper presents novel experiments shedding light on the shortcomings of current metrics for assessing biases of gender discrimination made by machine learning algorithms on textual data. We focus on the Bios dataset, and our learning task is to predict the occupation of individuals, based on their biography. Such prediction tasks are common in commercial Natural Language Processing (NLP) applications such as automatic job recommendations. We address an important limitation of theoretical discussions dealing with group-wise fairness metrics: they focus on large datasets, although the norm in many industrial NLP applications is to use small to reasonably large linguistic datasets for which the main practical constraint is to get a good prediction accuracy. We then question how reliable are different popular measures of bias when the size of the training set is simply sufficient to learn reasonably accurate predictions. Our experiments sample the Bios dataset and learn more than 200 models on different sample sizes. This allows us to statistically study our results and to confirm that common gender bias indices provide diverging and sometimes unreliable results when applied to relatively small training and test samples. This highlights the crucial importance of variance calculations for providing sound results in this field.


page 1

page 2

page 3

page 4


On the Basis of Sex: A Review of Gender Bias in Machine Learning Applications

Machine Learning models have been deployed across almost every aspect of...

Flexible Group Fairness Metrics for Survival Analysis

Algorithmic fairness is an increasingly important field concerned with d...

Deep Learning on a Healthy Data Diet: Finding Important Examples for Fairness

Data-driven predictive solutions predominant in commercial applications ...

Bridging Fairness and Environmental Sustainability in Natural Language Processing

Fairness and environmental impact are important research directions for ...

How optimal transport can tackle gender biases in multi-class neural-network classifiers for job recommendations?

Automatic recommendation systems based on deep neural networks have beco...

Fair Classification via Transformer Neural Networks: Case Study of an Educational Domain

Educational technologies nowadays increasingly use data and Machine Lear...

Algorithmic Bias and Data Bias: Understanding the Relation between Distributionally Robust Optimization and Data Curation

Machine learning systems based on minimizing average error have been sho...

Please sign up or login with your details

Forgot password? Click here to reset