A Customised Text Privatisation Mechanism with Differential Privacy

by   Huimin Chen, et al.

In Natural Language Understanding (NLU) applications, training an effective model often requires a massive amount of data. However, text data in the real world are scattered in different institutions or user devices. Directly sharing them with NLU service provider brings huge privacy risks, as text data often contains sensitive information, leading to potential privacy leakage. A typical way to protect privacy is to directly privatize raw text and leverage Differential Privacy (DP) to quantify the privacy protection level. However, existing text privatization mechanisms that privatize text by applying d_χ-privacy are not applicable for all similarity metrics and fail to achieve a good privacy-utility trade-off. This is primarily because (1) d_χ-privacy's strict requirements for similarity metrics; (2) they treat each input token equally. Bad utility-privacy trade-off performance impedes the adoption of current text privatization mechanisms in real-world applications. In this paper, we propose a Customised differentially private Text privatization mechanism (CusText) that assigns each input token a customized output set to provide more advanced adaptive privacy protection at the token level. It also overcomes the limitation for the similarity metrics caused by d_χ-privacy notion, by turning the mechanism to satisfy ϵ-DP. Furthermore, we provide two new text privatization strategies to boost the utility of privatized text without compromising privacy and design a new attack strategy for further evaluating the protection level of our mechanism empirically from a new attack's view. We also conduct extensive experiments on two widely used datasets to demonstrate that our proposed mechanism CusText can achieve a better privacy-utility trade-off and practical application value than the existing methods.


page 1

page 2

page 3

page 4


Learning With Differential Privacy

The leakage of data might have been an extreme effect on the personal le...

Differentially Private Decoupled Graph Convolutions for Multigranular Topology Protection

Graph learning methods, such as Graph Neural Networks (GNNs) based on gr...

Privacy-Adaptive BERT for Natural Language Understanding

When trying to apply the recent advance of Natural Language Understandin...

Real-World Trajectory Sharing with Local Differential Privacy

Sharing trajectories is beneficial for many real-world applications, suc...

Bistochastic privacy

We introduce a new privacy model relying on bistochastic matrices, that ...

Just Fine-tune Twice: Selective Differential Privacy for Large Language Models

With the increasing adoption of NLP models in real-world products, it be...

Summary Statistic Privacy in Data Sharing

Data sharing between different parties has become increasingly common ac...

Please sign up or login with your details

Forgot password? Click here to reset