MDTD: A Multi Domain Trojan Detector for Deep Neural Networks

by   Arezoo Rajabi, et al.

Machine learning models that use deep neural networks (DNNs) are vulnerable to backdoor attacks. An adversary carrying out a backdoor attack embeds a predefined perturbation called a trigger into a small subset of input samples and trains the DNN such that the presence of the trigger in the input results in an adversary-desired output class. Such adversarial retraining however needs to ensure that outputs for inputs without the trigger remain unaffected and provide high classification accuracy on clean samples. In this paper, we propose MDTD, a Multi-Domain Trojan Detector for DNNs, which detects inputs containing a Trojan trigger at testing time. MDTD does not require knowledge of trigger-embedding strategy of the attacker and can be applied to a pre-trained DNN model with image, audio, or graph-based inputs. MDTD leverages an insight that input samples containing a Trojan trigger are located relatively farther away from a decision boundary than clean samples. MDTD estimates the distance to a decision boundary using adversarial learning methods and uses this distance to infer whether a test-time input sample is Trojaned or not. We evaluate MDTD against state-of-the-art Trojan detection methods across five widely used image-based datasets: CIFAR100, CIFAR10, GTSRB, SVHN, and Flowers102; four graph-based datasets: AIDS, WinMal, Toxicant, and COLLAB; and the SpeechCommand audio dataset. MDTD effectively identifies samples that contain different types of Trojan triggers. We evaluate MDTD against adaptive attacks where an adversary trains a robust DNN to increase (decrease) distance of benign (Trojan) inputs from a decision boundary.


page 2

page 8


Anomaly Detection of Test-Time Evasion Attacks using Class-conditional Generative Adversarial Networks

Deep Neural Networks (DNNs) have been shown vulnerable to adversarial (T...

DeepCloak: Masking Deep Neural Network Models for Robustness Against Adversarial Samples

Recent studies have shown that deep neural networks (DNN) are vulnerable...

Neural Network Trojans Analysis and Mitigation from the Input Domain

Deep Neural Networks (DNNs) can learn Trojans (or backdoors) from benign...

Odyssey: Creation, Analysis and Detection of Trojan Models

Along with the success of deep neural network (DNN) models in solving va...

COLLIDER: A Robust Training Framework for Backdoor Data

Deep neural network (DNN) classifiers are vulnerable to backdoor attacks...

T-Miner: A Generative Approach to Defend Against Trojan Attacks on DNN-based Text Classification

Deep Neural Network (DNN) classifiers are known to be vulnerable to Troj...

Game of Trojans: A Submodular Byzantine Approach

Machine learning models in the wild have been shown to be vulnerable to ...

Please sign up or login with your details

Forgot password? Click here to reset