Neural Polarizer: A Lightweight and Effective Backdoor Defense via Purifying Poisoned Features

06/29/2023
by   Mingli Zhu, et al.
0

Recent studies have demonstrated the susceptibility of deep neural networks to backdoor attacks. Given a backdoored model, its prediction of a poisoned sample with trigger will be dominated by the trigger information, though trigger information and benign information coexist. Inspired by the mechanism of the optical polarizer that a polarizer could pass light waves with particular polarizations while filtering light waves with other polarizations, we propose a novel backdoor defense method by inserting a learnable neural polarizer into the backdoored model as an intermediate layer, in order to purify the poisoned sample via filtering trigger information while maintaining benign information. The neural polarizer is instantiated as one lightweight linear transformation layer, which is learned through solving a well designed bi-level optimization problem, based on a limited clean dataset. Compared to other fine-tuning-based defense methods which often adjust all parameters of the backdoored model, the proposed method only needs to learn one additional layer, such that it is more efficient and requires less clean data. Extensive experiments demonstrate the effectiveness and efficiency of our method in removing backdoors across various neural network architectures and datasets, especially in the case of very limited clean data.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/24/2023

Enhancing Fine-Tuning Based Backdoor Defense with Sharpness-Aware Minimization

Backdoor defense, which aims to detect or mitigate the effect of malicio...
research
06/18/2020

Dissecting Deep Networks into an Ensemble of Generative Classifiers for Robust Predictions

Deep Neural Networks (DNNs) are often criticized for being susceptible t...
research
05/29/2019

Super Interaction Neural Network

Recent studies have demonstrated that the convolutional networks heavily...
research
11/22/2022

Backdoor Cleansing with Unlabeled Data

Due to the increasing computational demand of Deep Neural Networks (DNNs...
research
04/07/2021

The art of defense: letting networks fool the attacker

Some deep neural networks are invariant to some input transformations, s...
research
02/24/2022

Towards Effective and Robust Neural Trojan Defenses via Input Filtering

Trojan attacks on deep neural networks are both dangerous and surreptiti...
research
09/03/2021

How to Inject Backdoors with Better Consistency: Logit Anchoring on Clean Data

Since training a large-scale backdoored model from scratch requires a la...

Please sign up or login with your details

Forgot password? Click here to reset