Efficient Differentially Private F_0 Linear Sketching
A powerful feature of linear sketches is that from sketches of two data vectors, one can compute the sketch of the difference between the vectors. This allows us to answer fine-grained questions about the difference between two data sets. In this work, we consider how to achieve this kind of property with sketches that are differentially private. We describe how to compute linear sketches for F_0, the number of distinct elements, that can efficiently be made differentially private. Specifically, we consider a sketch that is linear over GF(2), mapping a vector x∈{0,1}^u to Hx∈{0,1}^τ for a matrix H sampled from a suitable distribution H. Differential privacy is achieved by using randomized response, flipping each bit of Hx with probability p<1/2. That is, for a vector φ∈{0,1}^τ where [(φ)_j = 1] = p independently for each entry j, we consider the noisy sketch Hx + φ, where the addition of noise happens over GF(2). We show that for every choice of 0<β, ε < 1 there exists p<1/2 and a distribution H of linear sketches of size τ = O(log^4(u)ε^-2β^-2) such that: 1) For random H∼H and noise vector φ, given Hx + φ we can compute an estimate of ‖ x‖_0 that is accurate within a factor 1±β, plus an additive error O(log^3(u)ε^-2β^-2), with high probability, and 2) For every H∼H, Hx + φ is ε-differentially private over the randomness in φ. Previously, Mir et al. (PODS 2011) had described a private way of sketching F_0, but their noise vector φ is constructed using the exponential mechanism, and is not computationally efficient (quasipolynomial time in the sketch size)...
READ FULL TEXT