Certifying Data-Bias Robustness in Linear Regression

06/07/2022
by   Anna P. Meyer, et al.
0

Datasets typically contain inaccuracies due to human error and societal biases, and these inaccuracies can affect the outcomes of models trained on such datasets. We present a technique for certifying whether linear regression models are pointwise-robust to label bias in the training dataset, i.e., whether bounded perturbations to the labels of a training dataset result in models that change the prediction of test points. We show how to solve this problem exactly for individual test points, and provide an approximate but more scalable method that does not require advance knowledge of the test point. We extensively evaluate both techniques and find that linear models – both regression- and classification-based – often display high levels of bias-robustness. However, we also unearth gaps in bias-robustness, such as high levels of non-robustness for certain bias assumptions on some datasets. Overall, our approach can serve as a guide for when to trust, or question, a model's output.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/28/2018

Novel Prediction Techniques Based on Clusterwise Linear Regression

In this paper we explore different regression models based on Clusterwis...
research
10/08/2021

Certifying Robustness to Programmable Data Bias in Decision Trees

Datasets can be biased due to societal inequities, human biases, under-r...
research
02/24/2023

UnbiasedNets: A Dataset Diversification Framework for Robustness Bias Alleviation in Neural Networks

Performance of trained neural network (NN) models, in terms of testing a...
research
08/06/2019

Debiasing Linear Prediction

Standard methods in supervised learning separate training and prediction...
research
04/20/2023

The Dataset Multiplicity Problem: How Unreliable Data Impacts Predictions

We introduce dataset multiplicity, a way to study how inaccuracies, unce...
research
03/08/2023

Certifiable Robustness for Naive Bayes Classifiers

Data cleaning is crucial but often laborious in most machine learning (M...
research
10/09/2020

CryptoCredit: Securely Training Fair Models

When developing models for regulated decision making, sensitive features...

Please sign up or login with your details

Forgot password? Click here to reset