Tight Lower Bound for Comparison-Based Quantile Summaries

05/09/2019
by   Graham Cormode, et al.
0

Quantiles, such as the median or percentiles, provide concise and useful information about the distribution of a collection of items, drawn from a linearly ordered universe. We study data structures, called quantile summaries, which keep track of all quantiles, up to an error of at most ε. That is, an ε-approximate quantile summary first processes a stream of items and then, given any quantile query 0<ϕ< 1, returns an item from the stream, which is a ϕ'-quantile for some ϕ' = ϕ±ε. We focus on comparison-based quantile summaries that can only compare two items and are otherwise completely oblivious of the universe. The best such deterministic quantile summary to date, by Greenwald and Khanna (ACM SIGMOD '01), stores at most O(1/ε·ε N) items, where N is the number of items in the stream. We prove that this space bound is optimal by providing a matching lower bound. Our result thus rules out the possibility of constructing a deterministic comparison-based quantile summary in space f(ε)· o( N), for any function f that does not depend on N. A consequence of our results is also to show a lower bound for randomized algorithms.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/11/2023

Generalizing Greenwald-Khanna Streaming Quantile Summaries for Weighted Inputs

Estimating quantiles, like the median or percentiles, is a fundamental t...
research
02/08/2020

Storyboard: Optimizing Precomputed Summaries for Aggregation

An emerging class of data systems partition their data and precompute ap...
research
12/07/2021

SpaceSaving^±: An Optimal Algorithm for Frequency Estimation and Frequent items in the Bounded Deletion Model

In this paper, we propose the first deterministic algorithms to solve th...
research
01/06/2022

SQUAD: Combining Sketching and Sampling Is Better than Either for Per-item Quantile Estimation

Stream monitoring is fundamental in many data stream applications, such ...
research
01/17/2021

Data stream fusion for accurate quantile tracking and analysis

UDDSKETCH is a recent algorithm for accurate tracking of quantiles in da...
research
02/18/2021

Theory meets Practice at the Median: a worst case comparison of relative error quantile algorithms

Estimating the distribution and quantiles of data is a foundational task...
research
04/17/2020

A Survey of Approximate Quantile Computation on Large-scale Data (Technical Report)

As data volume grows extensively, data profiling helps to extract metada...

Please sign up or login with your details

Forgot password? Click here to reset