Null It Out: Guarding Protected Attributes by Iterative Nullspace Projection

04/16/2020
by   Shauli Ravfogel, et al.
0

The ability to control for the kinds of information encoded in neural representation has a variety of use cases, especially in light of the challenge of interpreting these models. We present Iterative Null-space Projection (INLP), a novel method for removing information from neural representations. Our method is based on repeated training of linear classifiers that predict a certain property we aim to remove, followed by projection of the representations on their null-space. By doing so, the classifiers become oblivious to that target property, making it hard to linearly separate the data according to it. While applicable for general scenarios, we evaluate our method on bias and fairness use-cases, and show that our method is able to mitigate bias in word embeddings, as well as to increase fairness in a setting of multi-class classification.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/17/2023

Shielded Representations: Protecting Sensitive Attributes Through Iterative Gradient-Based Projection

Natural language processing models tend to learn and encode social biase...
research
12/08/2022

Better Hit the Nail on the Head than Beat around the Bush: Removing Protected Attributes with a Single Projection

Bias elimination and recent probing studies attempt to remove specific i...
research
10/27/2021

Feature and Label Embedding Spaces Matter in Addressing Image Classifier Bias

This paper strives to address image classifier bias, with a focus on bot...
research
05/19/2021

Obstructing Classification via Projection

Machine learning and data mining techniques are effective tools to class...
research
10/08/2021

Measure Twice, Cut Once: Quantifying Bias and Fairness in Deep Neural Networks

Algorithmic bias is of increasing concern, both to the research communit...
research
06/15/2022

Beyond Adult and COMPAS: Fairness in Multi-Class Prediction

We consider the problem of producing fair probabilistic classifiers for ...

Please sign up or login with your details

Forgot password? Click here to reset