Flow-Loss: Learning Cardinality Estimates That Matter

01/13/2021
by   Parimarjan Negi, et al.
0

Previous approaches to learned cardinality estimation have focused on improving average estimation error, but not all estimates matter equally. Since learned models inevitably make mistakes, the goal should be to improve the estimates that make the biggest difference to an optimizer. We introduce a new loss function, Flow-Loss, that explicitly optimizes for better query plans by approximating the optimizer's cost model and dynamic programming search algorithm with analytical functions. At the heart of Flow-Loss is a reduction of query optimization to a flow routing problem on a certain plan graph in which paths correspond to different query plans. To evaluate our approach, we introduce the Cardinality Estimation Benchmark, which contains the ground truth cardinalities for sub-plans of over 16K queries from 21 templates with up to 15 joins. We show that across different architectures and databases, a model trained with Flow-Loss improves the cost of plans (using the PostgreSQL cost model) and query runtimes despite having worse estimation accuracy than a model trained with Q-Error. When the test set queries closely match the training queries, both models improve performance significantly over PostgreSQL and are close to the optimal performance (using true cardinalities). However, the Q-Error trained model degrades significantly when evaluated on queries that are slightly different (e.g., similar but not identical query templates), while the Flow-Loss trained model generalizes better to such situations. For example, the Flow-Loss model achieves up to 1.5x better runtimes on unseen templates compared to the Q-Error model, despite leveraging the same model architecture and training data.

READ FULL TEXT

page 6

page 10

page 11

page 12

research
11/22/2017

Adaptive Cardinality Estimation

In this paper we address cardinality estimation problem which is an impo...
research
01/05/2021

A Survey on Advancing the DBMS Query Optimizer: Cardinality Estimation, Cost Model, and Plan Enumeration

Query optimizer is at the heart of the database systems. Cost-based opti...
research
02/21/2019

How I Learned to Stop Worrying and Love Re-optimization

Cost-based query optimizers remain one of the most important components ...
research
06/11/2023

Kepler: Robust Learning for Faster Parametric Query Optimization

Most existing parametric query optimization (PQO) techniques rely on tra...
research
05/15/2019

An Empirical Analysis of Deep Learning for Cardinality Estimation

We implement and evaluate deep learning for cardinality estimation by st...
research
04/10/2023

COOOL: A Learning-To-Rank Approach for SQL Hint Recommendations

Query optimization is a pivotal part of every database management system...
research
01/21/2018

Learning to Speed Up Query Planning in Graph Databases

Querying graph structured data is a fundamental operation that enables i...

Please sign up or login with your details

Forgot password? Click here to reset