Forgettable Federated Linear Learning with Certified Data Removal
Federated learning (FL) is a trending distributed learning framework that enables collaborative model training without data sharing. Machine learning models trained on datasets can potentially expose the private information of the training data, revealing details about individual data records. In this study, we focus on the FL paradigm that grants clients the “right to be forgotten”. The forgettable FL framework should bleach its global model weights as it has never seen that client and hence does not reveal any information about the client. To this end, we propose the Forgettable Federated Linear Learning (2F2L) framework featured with novel training and data removal strategies. The training pipeline, named Federated linear training, employs linear approximation on the model parameter space to enable our 2F2L framework work for deep neural networks while achieving comparable results with canonical neural network training. We also introduce FedRemoval, an efficient and effective removal strategy that tackles the computational challenges in FL by approximating the Hessian matrix using public server data from the pretrained model. Unlike the previous uncertified and heuristic machine unlearning methods in FL, we provide theoretical guarantees by bounding the differences of model weights by our FedRemoval and that from retraining from scratch. Experimental results on MNIST and Fashion-MNIST datasets demonstrate the effectiveness of our method in achieving a balance between model accuracy and information removal, outperforming baseline strategies and approaching retraining from scratch.
READ FULL TEXT