Fairwashing Explanations with Off-Manifold Detergent

07/20/2020
by   Christopher J. Anders, et al.
0

Explanation methods promise to make black-box classifiers more transparent. As a result, it is hoped that they can act as proof for a sensible, fair and trustworthy decision-making process of the algorithm and thereby increase its acceptance by the end-users. In this paper, we show both theoretically and experimentally that these hopes are presently unfounded. Specifically, we show that, for any classifier g, one can always construct another classifier g̃ which has the same behavior on the data (same train, validation, and test error) but has arbitrarily manipulated explanation maps. We derive this statement theoretically using differential geometry and demonstrate it experimentally for various explanation methods, architectures, and datasets. Motivated by our theoretical insights, we then propose a modification of existing explanation methods which makes them significantly more robust.

READ FULL TEXT

page 6

page 16

page 17

page 18

page 19

page 20

page 21

research
10/05/2022

Explanation Uncertainty with Decision Boundary Awareness

Post-hoc explanation methods have become increasingly depended upon for ...
research
06/19/2019

Explanations can be manipulated and geometry is to blame

Explanation methods aim to make neural networks more trustworthy and int...
research
06/07/2022

Fooling Explanations in Text Classifiers

State-of-the-art text classification models are becoming increasingly re...
research
02/01/2022

Framework for Evaluating Faithfulness of Local Explanations

We study the faithfulness of an explanation system to the underlying pre...
research
01/28/2021

Explaining Natural Language Processing Classifiers with Occlusion and Language Modeling

Deep neural networks are powerful statistical learners. However, their p...
research
12/02/2019

EMAP: Explanation by Minimal Adversarial Perturbation

Modern instance-based model-agnostic explanation methods (LIME, SHAP, L2...
research
07/06/2020

Making Fair ML Software using Trustworthy Explanation

Machine learning software is being used in many applications (finance, h...

Please sign up or login with your details

Forgot password? Click here to reset