Text-to-Image Models for Counterfactual Explanations: a Black-Box Approach

09/14/2023
by   Guillaume Jeanneret, et al.
0

This paper addresses the challenge of generating Counterfactual Explanations (CEs), involving the identification and modification of the fewest necessary features to alter a classifier's prediction for a given image. Our proposed method, Text-to-Image Models for Counterfactual Explanations (TIME), is a black-box counterfactual technique based on distillation. Unlike previous methods, this approach requires solely the image and its prediction, omitting the need for the classifier's structure, parameters, or gradients. Before generating the counterfactuals, TIME introduces two distinct biases into Stable Diffusion in the form of textual embeddings: the context bias, associated with the image's structure, and the class bias, linked to class-specific features learned by the target classifier. After learning these biases, we find the optimal latent code applying the classifier's predicted class token and regenerate the image using the target embedding as conditioning, producing the counterfactual explanation. Extensive empirical studies validate that TIME can generate explanations of comparable effectiveness even when operating within a black-box setting.

READ FULL TEXT
research
06/16/2021

Counterfactual Graphs for Explainable Classification of Brain Networks

Training graph classifiers able to distinguish between healthy brains an...
research
03/25/2021

ECINN: Efficient Counterfactuals from Invertible Neural Networks

Counterfactual examples identify how inputs can be altered to change the...
research
07/15/2022

CheXplaining in Style: Counterfactual Explanations for Chest X-rays using StyleGAN

Deep learning models used in medical image analysis are prone to raising...
research
08/18/2021

CARE: Coherent Actionable Recourse based on Sound Counterfactual Explanations

Counterfactual explanation methods interpret the outputs of a machine le...
research
05/23/2022

What You See is What You Classify: Black Box Attributions

An important step towards explaining deep image classifiers lies in the ...
research
10/22/2021

Text Counterfactuals via Latent Optimization and Shapley-Guided Search

We study the problem of generating counterfactual text for a classifier ...
research
09/27/2019

Interpreting Undesirable Pixels for Image Classification on Black-Box Models

In an effort to interpret black-box models, researches for developing ex...

Please sign up or login with your details

Forgot password? Click here to reset