Distorting Neural Representations to Generate Highly Transferable Adversarial Examples

by   Muzammal Naseer, et al.

Deep neural networks (DNN) can be easily fooled by adding human imperceptible perturbations to the images. These perturbed images are known as the `adversarial examples' that pose a serious threat to security and safety critical systems. A litmus test for the strength of adversarial examples is their transferability across different DNN models in a black box setting (i.i. when target model's architecture and parameters are not known to attacker). Current attack algorithms that seek to enhance adversarial transferability work on the decision level i.e. generate perturbations that alter the network decisions. This leads to two key limitations: (a) An attack is dependent on the task-specific loss function (e.g. softmax cross-entropy for object recognition) and therefore does not generalize beyond its original task. (b) The adversarial examples are specific to the network architecture and demonstrate poor transferability to other network architectures. We propose a novel approach to create adversarial examples that can broadly fool different networks on multiple tasks. Our approach is based on the following intuition: "Deep features are highly generalizable and show excellent performance across different tasks, therefore an ideal attack must create maximum distortions in the feature space to realize highly transferable examples". Specifically, for an input image, we calculate perturbations that push its feature representations furthest away from the original image features. We report extensive experiments to show how adversarial examples generalize across multiple networks across classification, object detection and segmentation tasks.


page 4

page 8


Understanding and Enhancing the Transferability of Adversarial Examples

State-of-the-art deep neural networks are known to be vulnerable to adve...

Open Set Adversarial Examples

Adversarial examples in recent works target at closed set recognition sy...

Adversarial Examples for Semantic Segmentation and Object Detection

It has been well demonstrated that adversarial examples, i.e., natural i...

Fantastic DNN Classifiers and How to Identify them without Data

Current algorithms and architecture can create excellent DNN classifier ...

Towards Imperceptible Adversarial Image Patches Based on Network Explanations

The vulnerability of deep neural networks (DNNs) for adversarial example...

How the Softmax Output is Misleading for Evaluating the Strength of Adversarial Examples

Even before deep learning architectures became the de facto models for c...

Traits & Transferability of Adversarial Examples against Instance Segmentation & Object Detection

Despite the recent advancements in deploying neural networks for image c...

Please sign up or login with your details

Forgot password? Click here to reset