SSL-Cleanse: Trojan Detection and Mitigation in Self-Supervised Learning

03/16/2023
by   Mengxin Zheng, et al.
0

Self-supervised learning (SSL) is a commonly used approach to learning and encoding data representations. By using a pre-trained SSL image encoder and training a downstream classifier on top of it, impressive performance can be achieved on various tasks with very little labeled data. The increasing usage of SSL has led to an uptick in security research related to SSL encoders and the development of various Trojan attacks. The danger posed by Trojan attacks inserted in SSL encoders lies in their ability to operate covertly and spread widely among various users and devices. The presence of backdoor behavior in Trojaned encoders can inadvertently be inherited by downstream classifiers, making it even more difficult to detect and mitigate the threat. Although current Trojan detection methods in supervised learning can potentially safeguard SSL downstream classifiers, identifying and addressing triggers in the SSL encoder before its widespread dissemination is a challenging task. This is because downstream tasks are not always known, dataset labels are not available, and even the original training dataset is not accessible during the SSL encoder Trojan detection. This paper presents an innovative technique called SSL-Cleanse that is designed to detect and mitigate backdoor attacks in SSL encoders. We evaluated SSL-Cleanse on various datasets using 300 models, achieving an average detection success rate of 83.7 mitigating backdoors, on average, backdoored encoders achieve 0.24 success rate without great accuracy loss, proving the effectiveness of SSL-Cleanse.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/23/2023

Detecting Backdoors in Pre-trained Encoders

Self-supervised learning in computer vision trains on unlabeled data, su...
research
09/16/2022

Dataset Inference for Self-Supervised Models

Self-supervised models are increasingly prevalent in machine learning (M...
research
11/20/2022

ESTAS: Effective and Stable Trojan Attacks in Self-supervised Encoders with One Target Unlabelled Sample

Emerging self-supervised learning (SSL) has become a popular image repre...
research
03/14/2023

Lightweight feature encoder for wake-up word detection based on self-supervised speech representation

Self-supervised learning method that provides generalized speech represe...
research
12/06/2022

Pre-trained Encoders in Self-Supervised Learning Improve Secure and Privacy-preserving Supervised Learning

Classifiers in supervised learning have various security and privacy iss...
research
09/01/2022

Network Intrusion Detection with Limited Labeled Data

With the increasing dependency of daily life over computer networks, the...
research
05/16/2022

On the Difficulty of Defending Self-Supervised Learning against Model Extraction

Self-Supervised Learning (SSL) is an increasingly popular ML paradigm th...

Please sign up or login with your details

Forgot password? Click here to reset