Generalizing DP-SGD with Shuffling and Batching Clipping
Classical differential private DP-SGD implements individual clipping with random subsampling, which forces a mini-batch SGD approach. We provide a general differential private algorithmic framework that goes beyond DP-SGD and allows any possible first order optimizers (e.g., classical SGD and momentum based SGD approaches) in combination with batch clipping, which clips an aggregate of computed gradients rather than summing clipped gradients (as is done in individual clipping). The framework also admits sampling techniques beyond random subsampling such as shuffling. Our DP analysis follows the f-DP approach and introduces a new proof technique which allows us to also analyse group privacy. In particular, for E epochs work and groups of size g, we show a √(g E) DP dependency for batch clipping with shuffling. This is much better than the previously anticipated linear dependency in g and is much better than the previously expected square root dependency on the total number of rounds within E epochs which is generally much more than √(E).
READ FULL TEXT