Fairness in Risk Assessment Instruments: Post-Processing to Achieve Counterfactual Equalized Odds
Algorithmic fairness is a topic of increasing concern both within research communities and among the general public. Conventional fairness criteria place restrictions on the joint distribution of a sensitive feature A, an outcome Y, and a predictor S. For example, the criterion of equalized odds requires that S be conditionally independent of A given Y, or equivalently, when all three variables are binary, that the false positive and false negative rates of the predictor be the same for two levels of A. However, fairness criteria based around observable Y are misleading when applied to Risk Assessment Instruments (RAIs), such as predictors designed to estimate the risk of recidivism or child neglect. It has been argued instead that RAIs ought to be trained and evaluated with respect to potential outcomes Y^0. Here, Y^0 represents the outcome that would be observed under no intervention–for example, whether recidivism would occur if a defendant were to be released pretrial. In this paper, we develop a method to post-process an existing binary predictor to satisfy approximate counterfactual equalized odds, which requires S to be nearly conditionally independent of A given Y^0, within a tolerance specified by the user. Our predictor converges to an optimal fair predictor at √(n) rates under appropriate assumptions. We propose doubly robust estimators of the risk and fairness properties of a fixed post-processed predictor, and we show that they are √(n)-consistent and asymptotically normal under appropriate assumptions.
READ FULL TEXT