Perfect L_p Sampling in a Data Stream

08/16/2018
by   Rajesh Jayaram, et al.
0

In this paper, we resolve the one-pass space complexity of L_p sampling for p ∈ (0,2). Given a stream of updates (insertions and deletions) to the coordinates of an underlying vector f ∈R^n, a perfect L_p sampler must output an index i with probability |f_i|^p/f_p^p, and is allowed to fail with some probability δ. So far, for p > 0 no algorithm has been shown to solve the problem exactly using poly( n)-bits of space. In 2010, Monemizadeh and Woodruff introduced an approximate L_p sampler, which outputs i with probability (1 ±ν)|f_i|^p /f_p^p, using space polynomial in ν^-1 and (n). The space complexity was later reduced by Jowhari, Sağlam, and Tardos to roughly O(ν^-p^2 n δ^-1) for p ∈ (0,2), which tightly matches the Ω(^2 n δ^-1) lower bound in terms of n and δ, but is loose in terms of ν. Given these nearly tight bounds, it is perhaps surprising that no lower bound at all exists in terms of ν---not even a bound of Ω(ν^-1) is known. In this paper, we explain this phenomenon by demonstrating the existence of an O(^2 n δ^-1)-bit perfect L_p sampler for p ∈ (0,2). This shows that ν need not factor into the space of an L_p sampler, which completely closes the complexity of the problem for this range of p. For p=2, our bound is O(^3 n δ^-1)-bits, which matches the prior best known upper bound of O(ν^-2^3n δ^-1), but has no dependence on ν. Finally, we show improved upper and lower bounds for returning a (1±ϵ) relative error estimate of the frequency f_i of the sampled index i.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset
Success!
Error Icon An error occurred

Sign in with Google

×

Use your Google Account to sign in to DeepAI

×

Consider DeepAI Pro