Approaching Machine Learning Fairness through Adversarial Network
Fairness is becoming a rising concern w.r.t. machine learning model performance. Especially for sensitive fields such as criminal justice and loan decision, eliminating the prediction discrimination towards a certain group of population (characterized by sensitive features like race and gender) is important for enhancing the trustworthiness of model. In this paper, we present a new general framework to improve machine learning fairness. The goal of our model is to minimize the influence of sensitive feature from the perspectives of both the data input and the predictive model. In order to achieve this goal, we reformulate the data input by removing the sensitive information and strengthen model fairness by minimizing the marginal contribution of the sensitive feature. We propose to learn the non-sensitive input via sampling among features and design an adversarial network to minimize the dependence between the reformulated input and the sensitive information. Extensive experiments on three benchmark datasets suggest that our model achieve better results than related state-of-the-art methods with respect to both fairness metrics and prediction performance.
READ FULL TEXT