NTD: Non-Transferability Enabled Backdoor Detection

by   Yinshan Li, et al.

A backdoor deep learning (DL) model behaves normally upon clean inputs but misbehaves upon trigger inputs as the backdoor attacker desires, posing severe consequences to DL model deployments. State-of-the-art defenses are either limited to specific backdoor attacks (source-agnostic attacks) or non-user-friendly in that machine learning (ML) expertise or expensive computing resources are required. This work observes that all existing backdoor attacks have an inevitable intrinsic weakness, non-transferability, that is, a trigger input hijacks a backdoored model but cannot be effective to another model that has not been implanted with the same backdoor. With this key observation, we propose non-transferability enabled backdoor detection (NTD) to identify trigger inputs for a model-under-test (MUT) during run-time.Specifically, NTD allows a potentially backdoored MUT to predict a class for an input. In the meantime, NTD leverages a feature extractor (FE) to extract feature vectors for the input and a group of samples randomly picked from its predicted class, and then compares similarity between the input and the samples in the FE's latent space. If the similarity is low, the input is an adversarial trigger input; otherwise, benign. The FE is a free pre-trained model privately reserved from open platforms. As the FE and MUT are from different sources, the attacker is very unlikely to insert the same backdoor into both of them. Because of non-transferability, a trigger effect that does work on the MUT cannot be transferred to the FE, making NTD effective against different types of backdoor attacks. We evaluate NTD on three popular customized tasks such as face recognition, traffic sign recognition and general animal classification, results of which affirm that NDT has high effectiveness (low false acceptance rate) and usability (low false rejection rate) with low detection latency.


page 6

page 9

page 11


AntidoteRT: Run-time Detection and Correction of Poison Attacks on Neural Networks

We study backdoor poisoning attacks against image classification network...

On the Intriguing Connections of Regularization, Input Gradients and Transferability of Evasion and Poisoning Attacks

Transferability captures the ability of an attack against a machine-lear...

STRIP: A Defence Against Trojan Attacks on Deep Neural Networks

Recent trojan attacks on deep neural network (DNN) models are one insidi...

Model Agnostic Defence against Backdoor Attacks in Machine Learning

Machine Learning (ML) has automated a multitude of our day-to-day decisi...

Detection of Face Recognition Adversarial Attacks

Deep Learning methods have become state-of-the-art for solving tasks suc...

CASSOCK: Viable Backdoor Attacks against DNN in The Wall of Source-Specific Backdoor Defences

Backdoor attacks have been a critical threat to deep neural network (DNN...

Disrupting Adversarial Transferability in Deep Neural Networks

Adversarial attack transferability is a well-recognized phenomenon in de...

Please sign up or login with your details

Forgot password? Click here to reset