Learning to Sample: Counting with Complex Queries

06/21/2019
by   Brett Walenz, et al.
0

In this paper we present a suite of methods to efficiently estimate counts for a generalized class of filters and queries (such as user-defined functions, join predicates, or correlated subqueries). For such queries, traditional sampling techniques may not be applicable due to the complexity of the filter preventing sampling over joins, and sampling after the join may not be feasible due to the cost of computing the full join. Our methods are built upon approximating a query's complex filters with a (faster) probabilistic classifier. From one trained classifier, we estimate counts using either weighted or stratified sampling, or directly quantify counts using classifier outputs on test data. We analyze our methods both theoretically and empirically. Theoretical results indicate that a classifier with certain performance guarantees can produce an estimator that produces counts with much tighter confidence intervals than classical simple random sampling or stratified sampling. We evaluate our methods on diverse scenarios using different data sets, counts, and filters, which empirically validates the accuracy and efficiency of our approach.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/19/2018

PRESTO: Probabilistic Cardinality Estimation for RDF Queries Based on Subgraph Overlapping

In query optimisation accurate cardinality estimation is essential for f...
research
07/28/2023

Predicate Transfer: Efficient Pre-Filtering on Multi-Join Queries

This paper presents predicate transfer, a novel method that optimizes jo...
research
11/08/2022

Consistent Query Answering for Primary Keys and Conjunctive Queries with Counting

The problem of consistent query answering for primary keys and self-join...
research
04/09/2020

Computing Local Sensitivities of Counting Queries with Joins

Local sensitivity of a query Q given a database instance D, i.e. how muc...
research
01/07/2022

Weighted Random Sampling over Joins

Joining records with all other records that meet a linkage condition can...
research
10/25/2021

QuantifyML: How Good is my Machine Learning Model?

The efficacy of machine learning models is typically determined by compu...
research
04/12/2018

Fast Counting in Machine Learning Applications

We propose scalable methods to execute counting queries in machine learn...

Please sign up or login with your details

Forgot password? Click here to reset