DECK: Model Hardening for Defending Pervasive Backdoors

06/18/2022
by   Guanhong Tao, et al.
7

Pervasive backdoors are triggered by dynamic and pervasive input perturbations. They can be intentionally injected by attackers or naturally exist in normally trained models. They have a different nature from the traditional static and localized backdoors that can be triggered by perturbing a small input area with some fixed pattern, e.g., a patch with solid color. Existing defense techniques are highly effective for traditional backdoors. However, they may not work well for pervasive backdoors, especially regarding backdoor removal and model hardening. In this paper, we propose a novel model hardening technique against pervasive backdoors, including both natural and injected backdoors. We develop a general pervasive attack based on an encoder-decoder architecture enhanced with a special transformation layer. The attack can model a wide range of existing pervasive backdoor attacks and quantify them by class distances. As such, using the samples derived from our attack in adversarial training can harden a model against these backdoor vulnerabilities. Our evaluation on 9 datasets with 15 model structures shows that our technique can enlarge class distances by 59.65 than 1 hardening techniques such as adversarial training, universal adversarial training, MOTH, etc. It can reduce the attack success rate of six pervasive backdoor attacks from 99.06 backdoor removal techniques.

READ FULL TEXT

page 1

page 3

page 4

page 5

page 6

page 7

page 18

research
04/07/2021

Universal Adversarial Training with Class-Wise Perturbations

Despite their overwhelming success on a wide range of applications, conv...
research
10/10/2018

Is PGD-Adversarial Training Necessary? Alternative Training via a Soft-Quantization Network with Noisy-Natural Samples Only

Recent work on adversarial attack and defense suggests that PGD is a uni...
research
01/27/2021

Meta Adversarial Training

Recently demonstrated physical-world adversarial attacks have exposed vu...
research
09/10/2019

Localized Adversarial Training for Increased Accuracy and Robustness in Image Classification

Today's state-of-the-art image classifiers fail to correctly classify ca...
research
06/30/2021

Local Reweighting for Adversarial Training

Instances-reweighted adversarial training (IRAT) can significantly boost...
research
07/24/2019

Joint Adversarial Training: Incorporating both Spatial and Pixel Attacks

Conventional adversarial training methods using attacks that manipulate ...
research
01/16/2023

BEAGLE: Forensics of Deep Learning Backdoor Attack for Better Defense

Deep Learning backdoor attacks have a threat model similar to traditiona...

Please sign up or login with your details

Forgot password? Click here to reset