Parallel In-Memory Evaluation of Spatial Joins

08/30/2019
by   Dimitrios Tsitsigkos, et al.
0

The spatial join is a popular operation in spatial database systems and its evaluation is a well-studied problem. As main memories become bigger and faster and commodity hardware supports parallel processing, there is a need to revamp classic join algorithms which have been designed for I/O-bound processing. In view of this, we study the in-memory and parallel evaluation of spatial joins, by re-designing a classic partitioning-based algorithm to consider alternative approaches for space partitioning. Our study shows that, compared to a straightforward implementation of the algorithm, our tuning can improve performance significantly. We also show how to select appropriate partitioning parameters based on data statistics, in order to tune the algorithm for the given join inputs. Our parallel implementation scales gracefully with the number of threads reducing the cost of the join to at most one second even for join inputs with tens of millions of rectangles.

READ FULL TEXT

page 7

page 8

research
11/16/2021

The Case for Learned In-Memory Joins

In-memory join is an essential operator in any database engine. It has b...
research
08/05/2022

Towards Fast Theta-join: A Prefiltering and Amalgamated Partitioning Approach

As one of the most useful online processing techniques, the theta-join o...
research
04/13/2020

Near-Optimal Distributed Band-Joins through Recursive Partitioning

We consider running-time optimization for band-joins in a distributed sy...
research
03/22/2022

Non-recursive Approach for Sort-Merge Join Operation

Several algorithms have been developed over the years to perform join op...
research
12/05/2021

Design Trade-offs for a Robust Dynamic Hybrid Hash Join (Extended Version)

The Join operator, as one of the most expensive and commonly used operat...
research
06/08/2023

Learned spatial data partitioning

Due to the significant increase in the size of spatial data, it is essen...
research
10/15/2019

Optimizing Semi-Stream CACHEJOIN for Near-Real-Time Data Warehousing

Streaming data join is a critical process in the field of near-real-time...

Please sign up or login with your details

Forgot password? Click here to reset