Hide and Seek (HaS): A Lightweight Framework for Prompt Privacy Protection

by   Yu Chen, et al.

Numerous companies have started offering services based on large language models (LLM), such as ChatGPT, which inevitably raises privacy concerns as users' prompts are exposed to the model provider. Previous research on secure reasoning using multi-party computation (MPC) has proven to be impractical for LLM applications due to its time-consuming and communication-intensive nature. While lightweight anonymization techniques can protect private information in prompts through substitution or masking, they fail to recover sensitive data replaced in the LLM-generated results. In this paper, we expand the application scenarios of anonymization techniques by training a small local model to de-anonymize the LLM's returned results with minimal computational overhead. We introduce the HaS framework, where "H(ide)" and "S(eek)" represent its two core processes: hiding private entities for anonymization and seeking private entities for de-anonymization, respectively. To quantitatively assess HaS's privacy protection performance, we propose both black-box and white-box adversarial models. Furthermore, we conduct experiments to evaluate HaS's usability in translation and classification tasks. The experimental findings demonstrate that the HaS framework achieves an optimal balance between privacy protection and utility.


page 1

page 2

page 3

page 4


Suboptimal Provision of Privacy and Statistical Accuracy When They are Public Goods

With vast databases at their disposal, private tech companies can compet...

What Does it Mean for a Language Model to Preserve Privacy?

Natural language reflects our private lives and identities, making its p...

An Empirical Study on the Membership Inference Attack against Tabular Data Synthesis Models

Tabular data typically contains private and important information; thus,...

Information Laundering for Model Privacy

In this work, we propose information laundering, a novel framework for e...

Just Fine-tune Twice: Selective Differential Privacy for Large Language Models

With the increasing adoption of NLP models in real-world products, it be...

Hiding in the Crowd: A Massively Distributed Algorithm for Private Averaging with Malicious Adversaries

The amount of personal data collected in our everyday interactions with ...

SGX-MR: Regulating Dataflows for Protecting Access Patterns of Data-Intensive SGX Applications

Intel SGX has been a popular trusted execution environment (TEE) for pro...

Please sign up or login with your details

Forgot password? Click here to reset