Robust Weight Signatures: Gaining Robustness as Easy as Patching Weights?

by   Ruisi Cai, et al.

Given a robust model trained to be resilient to one or multiple types of distribution shifts (e.g., natural image corruptions), how is that "robustness" encoded in the model weights, and how easily can it be disentangled and/or "zero-shot" transferred to some other models? This paper empirically suggests a surprisingly simple answer: linearly - by straightforward model weight arithmetic! We start by drawing several key observations: (1)assuming that we train the same model architecture on both a clean dataset and its corrupted version, resultant weights mostly differ in shallow layers; (2)the weight difference after projection, which we call "Robust Weight Signature" (RWS), appears to be discriminative and indicative of different corruption types; (3)for the same corruption type, the RWSs obtained by one model architecture are highly consistent and transferable across different datasets. We propose a minimalistic model robustness "patching" framework that carries a model trained on clean data together with its pre-extracted RWSs. In this way, injecting certain robustness to the model is reduced to directly adding the corresponding RWS to its weight. We verify our proposed framework to be remarkably (1)lightweight. since RWSs concentrate on the shallowest few layers and we further show they can be painlessly quantized, storing an RWS is up to 13 x more compact than storing the full weight copy; (2)in-situ adjustable. RWSs can be appended as needed and later taken off to restore the intact clean model. We further demonstrate one can linearly re-scale the RWS to control the patched robustness strength; (3)composable. Multiple RWSs can be added simultaneously to patch more comprehensive robustness at once; and (4)transferable. Even when the clean model backbone is continually adapted or updated, RWSs remain as effective patches due to their outstanding cross-dataset transferability.


page 4

page 8


Revisiting the Trade-off between Accuracy and Robustness via Weight Distribution of Filters

Adversarial attacks have been proven to be potential threats to Deep Neu...

Effective Robustness against Natural Distribution Shifts for Models with Different Training Data

“Effective robustness” measures the extra out-of-distribution (OOD) robu...

Deep Collaborative Weight-based Classification

One of the biggest problems in deep learning is its difficulty to retain...

Generalizable Lightweight Proxy for Robust NAS against Diverse Perturbations

Recent neural architecture search (NAS) frameworks have been successful ...

Are Sample-Efficient NLP Models More Robust?

Recent work has observed that pre-trained models have higher out-of-dist...

Deep neural networks are robust to weight binarization and other non-linear distortions

Recent results show that deep neural networks achieve excellent performa...

ATZSL: Defensive Zero-Shot Recognition in the Presence of Adversaries

Zero-shot learning (ZSL) has received extensive attention recently espec...

Please sign up or login with your details

Forgot password? Click here to reset