PatClArC: Using Pattern Concept Activation Vectors for Noise-Robust Model Debugging

by   Frederik Pahde, et al.

State-of-the-art machine learning models are commonly (pre-)trained on large benchmark datasets. These often contain biases, artifacts, or errors that have remained unnoticed in the data collection process and therefore fail in representing the real world truthfully. This can cause models trained on these datasets to learn undesired behavior based upon spurious correlations, e.g., the existence of a copyright tag in an image. Concept Activation Vectors (CAV) have been proposed as a tool to model known concepts in latent space and have been used for concept sensitivity testing and model correction. Specifically, class artifact compensation (ClArC) corrects models using CAVs to represent data artifacts in feature space linearly. Modeling CAVs with filters of linear models, however, causes a significant influence of the noise portion within the data, as recent work proposes the unsuitability of linear model filters to find the signal direction in the input, which can be avoided by instead using patterns. In this paper we propose Pattern Concept Activation Vectors (PCAV) for noise-robust concept representations in latent space. We demonstrate that pattern-based artifact modeling has beneficial effects on the application of CAVs as a means to remove influence of confounding features from models via the ClArC framework.


page 5

page 8

page 13

page 15

page 16

page 17

page 18

page 19


From Hope to Safety: Unlearning Biases of Deep Models by Enforcing the Right Reasons in Latent Space

Deep Neural Networks are prone to learning spurious correlations embedde...

Estimation of User's World Model Using Graph2vec

To obtain advanced interaction between autonomous robots and users, robo...

Simple Text Detoxification by Identifying a Linear Toxic Subspace in Language Model Embeddings

Large pre-trained language models are often trained on large volumes of ...

Improving Generalizability in Implicitly Abusive Language Detection with Concept Activation Vectors

Robustness of machine learning models on ever-changing real-world data i...

Improving Interpretability of CNN Models Using Non-Negative Concept Activation Vectors

Convolutional neural network (CNN) models for computer vision are powerf...

Concept-based explainability for an EEG transformer model

Deep learning models are complex due to their size, structure, and inher...

Text-To-Concept (and Back) via Cross-Model Alignment

We observe that the mapping between an image's representation in one mod...

Please sign up or login with your details

Forgot password? Click here to reset