Controlling the Extraction of Memorized Data from Large Language Models via Prompt-Tuning

05/19/2023
by   Mustafa Safa Ozdayi, et al.
0

Large Language Models (LLMs) are known to memorize significant portions of their training data. Parts of this memorized content have been shown to be extractable by simply querying the model, which poses a privacy risk. We present a novel approach which uses prompt-tuning to control the extraction rates of memorized content in LLMs. We present two prompt training strategies to increase and decrease extraction rates, which correspond to an attack and a defense, respectively. We demonstrate the effectiveness of our techniques by using models from the GPT-Neo family on a public benchmark. For the 1.3B parameter GPT-Neo model, our attack yields a 9.3 percentage point increase in extraction rate compared to our baseline. Our defense can be tuned to achieve different privacy-utility trade-offs by a user-specified hyperparameter. We achieve an extraction rate reduction of up to 97.7 with a perplexity increase of 16.9

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/14/2020

Extracting Training Data from Large Language Models

It has become common to publish large (billion parameter) language model...
research
05/25/2023

Training Data Extraction From Pre-trained Language Models: A Survey

As the deployment of pre-trained language models (PLMs) expands, pressin...
research
02/13/2023

Targeted Attack on GPT-Neo for the SATML Language Model Data Extraction Challenge

Previous work has shown that Large Language Models are susceptible to so...
research
09/19/2023

Model Leeching: An Extraction Attack Targeting LLMs

Model Leeching is a novel extraction attack targeting Large Language Mod...
research
05/24/2023

Tricking LLMs into Disobedience: Understanding, Analyzing, and Preventing Jailbreaks

Recent explorations with commercial Large Language Models (LLMs) have sh...
research
07/13/2023

Prompts Should not be Seen as Secrets: Systematically Measuring Prompt Extraction Attack Success

The generations of large language models are commonly controlled through...
research
09/21/2023

A Chinese Prompt Attack Dataset for LLMs with Evil Content

Large Language Models (LLMs) present significant priority in text unders...

Please sign up or login with your details

Forgot password? Click here to reset