Self-Destructing Models: Increasing the Costs of Harmful Dual Uses in Foundation Models

11/27/2022
by   Eric Mitchell, et al.
0

A growing ecosystem of large, open-source foundation models has reduced the labeled data and technical expertise necessary to apply machine learning to many new problems. Yet foundation models pose a clear dual-use risk, indiscriminately reducing the costs of building both harmful and beneficial machine learning systems. To mitigate this risk, we propose the task blocking paradigm, in which foundation models are trained with an additional mechanism to impede adaptation to harmful tasks while retaining good performance on desired tasks. We call the resulting models self-destructing models, inspired by mechanisms that prevent adversaries from using tools for harmful purposes. We present an algorithm for training self-destructing models leveraging techniques from meta-learning and adversarial learning, showing that it can largely prevent a BERT-based model from learning to perform gender identification without harming the model's ability to perform profession classification. We conclude with a discussion of future directions.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/17/2022

Foundation models in brief: A historical, socio-technical focus

Foundation models can be disruptive for future AI development by scaling...
research
07/06/2023

Empirical Analysis of a Segmentation Foundation Model in Prostate Imaging

Most state-of-the-art techniques for medical image segmentation rely on ...
research
01/12/2023

Toward Building General Foundation Models for Language, Vision, and Vision-Language Understanding Tasks

Foundation models or pre-trained models have substantially improved the ...
research
04/20/2023

Censoring chemical data to mitigate dual use risk

The dual use of machine learning applications, where models can be used ...
research
03/28/2023

Foundation Models and Fair Use

Existing foundation models are trained on copyrighted material. Deployin...
research
09/06/2022

Statistical Foundation Behind Machine Learning and Its Impact on Computer Vision

This paper revisits the principle of uniform convergence in statistical ...
research
07/08/2022

Big Learning: A Universal Machine Learning Paradigm?

Recent breakthroughs based on big/foundation models reveal a vague avenu...

Please sign up or login with your details

Forgot password? Click here to reset