Mitigating Approximate Memorization in Language Models via Dissimilarity Learned Policy

05/02/2023
by   Aly M. Kassem, et al.
0

Large Language models (LLMs) are trained on large amounts of data, which can include sensitive information that may compromise personal privacy. LLMs showed to memorize parts of the training data and emit those data verbatim when an adversary prompts appropriately. Previous research has primarily focused on data preprocessing and differential privacy techniques to address memorization or prevent verbatim memorization exclusively, which can give a false sense of privacy. However, these methods rely on explicit and implicit assumptions about the structure of the data to be protected, which often results in an incomplete solution to the problem. To address this, we propose a novel framework that utilizes a reinforcement learning approach (PPO) to fine-tune LLMs to mitigate approximate memorization. Our approach utilizes a negative similarity score, such as BERTScore or SacreBLEU, as a reward signal to learn a dissimilarity policy. Our results demonstrate that this framework effectively mitigates approximate memorization while maintaining high levels of coherence and fluency in the generated samples. Furthermore, our framework is robust in mitigating approximate memorization across various circumstances, including longer context, which is known to increase memorization in LLMs.

READ FULL TEXT

page 11

page 12

page 15

research
04/15/2022

Just Fine-tune Twice: Selective Differential Privacy for Large Language Models

With the increasing adoption of NLP models in real-world products, it be...
research
10/04/2022

Knowledge Unlearning for Mitigating Privacy Risks in Language Models

Pretrained Language Models (LMs) memorize a vast amount of knowledge dur...
research
05/04/2022

Provably Confidential Language Modelling

Large language models are shown to memorize privacy information such as ...
research
02/27/2020

ConQUR: Mitigating Delusional Bias in Deep Q-learning

Delusional bias is a fundamental source of error in approximate Q-learni...
research
12/16/2022

Planting and Mitigating Memorized Content in Predictive-Text Language Models

Language models are widely deployed to provide automatic text completion...
research
10/13/2022

Mitigating Unintended Memorization in Language Models via Alternating Teaching

Recent research has shown that language models have a tendency to memori...
research
02/15/2022

Quantifying Memorization Across Neural Language Models

Large language models (LMs) have been shown to memorize parts of their t...

Please sign up or login with your details

Forgot password? Click here to reset