Fairness via Adversarial Attribute Neighbourhood Robust Learning
Improving fairness between privileged and less-privileged sensitive attribute groups (e.g, race, gender) has attracted lots of attention. To enhance the model performs uniformly well in different sensitive attributes, we propose a principled Robust Adversarial Attribute Neighbourhood (RAAN) loss to debias the classification head and promote a fairer representation distribution across different sensitive attribute groups. The key idea of RAAN is to mitigate the differences of biased representations between different sensitive attribute groups by assigning each sample an adversarial robust weight, which is defined on the representations of adversarial attribute neighbors, i.e, the samples from different protected groups. To provide efficient optimization algorithms, we cast the RAAN into a sum of coupled compositional functions and propose a stochastic adaptive (Adam-style) and non-adaptive (SGD-style) algorithm framework SCRAAN with provable theoretical guarantee. Extensive empirical studies on fairness-related benchmark datasets verify the effectiveness of the proposed method.
READ FULL TEXT