Towards Automatic Comparison of Data Privacy Documents: A Preliminary Experiment on GDPR-like Laws
General Data Protection Regulation (GDPR) becomes a standard law for data protection in many countries. Currently, twelve countries adopt the regulation and establish their GDPR-like regulation. However, to evaluate the differences and similarities of these GDPR-like regulations is time-consuming and needs a lot of manual effort from legal experts. Moreover, GDPR-like regulations from different countries are written in their languages leading to a more difficult task since legal experts who know both languages are essential. In this paper, we investigate a simple natural language processing (NLP) approach to tackle the problem. We first extract chunks of information from GDPR-like documents and form structured data from natural language. Next, we use NLP methods to compare documents to measure their similarity. Finally, we manually label a small set of data to evaluate our approach. The empirical result shows that the BERT model with cosine similarity outperforms other baselines. Our data and code are publicly available.
READ FULL TEXT