BaDExpert: Extracting Backdoor Functionality for Accurate Backdoor Input Detection

by   Tinghao Xie, et al.

We present a novel defense, against backdoor attacks on Deep Neural Networks (DNNs), wherein adversaries covertly implant malicious behaviors (backdoors) into DNNs. Our defense falls within the category of post-development defenses that operate independently of how the model was generated. The proposed defense is built upon a novel reverse engineering approach that can directly extract backdoor functionality of a given backdoored model to a backdoor expert model. The approach is straightforward – finetuning the backdoored model over a small set of intentionally mislabeled clean samples, such that it unlearns the normal functionality while still preserving the backdoor functionality, and thus resulting in a model (dubbed a backdoor expert model) that can only recognize backdoor inputs. Based on the extracted backdoor expert model, we show the feasibility of devising highly accurate backdoor input detectors that filter out the backdoor inputs during model inference. Further augmented by an ensemble strategy with a finetuned auxiliary model, our defense, BaDExpert (Backdoor Input Detection with Backdoor Expert), effectively mitigates 16 SOTA backdoor attacks while minimally impacting clean utility. The effectiveness of BaDExpert has been verified on multiple datasets (CIFAR10, GTSRB and ImageNet) across various model architectures (ResNet, VGG, MobileNetV2 and Vision Transformer).


page 1

page 2

page 3

page 4


Reconstructive Neuron Pruning for Backdoor Defense

Deep neural networks (DNNs) have been found to be vulnerable to backdoor...

Defending Against Model Stealing Attacks with Adaptive Misinformation

Deep Neural Networks (DNNs) are susceptible to model stealing attacks, w...

Backdoor Cleansing with Unlabeled Data

Due to the increasing computational demand of Deep Neural Networks (DNNs...

Don't FREAK Out: A Frequency-Inspired Approach to Detecting Backdoor Poisoned Samples in DNNs

In this paper we investigate the frequency sensitivity of Deep Neural Ne...

Poison as a Cure: Detecting Neutralizing Variable-Sized Backdoor Attacks in Deep Neural Networks

Deep learning models have recently shown to be vulnerable to backdoor po...

NNoculation: Broad Spectrum and Targeted Treatment of Backdoored DNNs

This paper proposes a novel two-stage defense (NNoculation) against back...

Optimal Smoothing Distribution Exploration for Backdoor Neutralization in Deep Learning-based Traffic Systems

Deep Reinforcement Learning (DRL) enhances the efficiency of Autonomous ...

Please sign up or login with your details

Forgot password? Click here to reset