WISK: A Workload-aware Learned Index for Spatial Keyword Queries

02/28/2023
by   Yufan Sheng, et al.
0

Spatial objects often come with textual information, such as Points of Interest (POIs) with their descriptions, which are referred to as geo-textual data. To retrieve such data, spatial keyword queries that take into account both spatial proximity and textual relevance have been extensively studied. Existing indexes designed for spatial keyword queries are mostly built based on the geo-textual data without considering the distribution of queries already received. However, previous studies have shown that utilizing the known query distribution can improve the index structure for future query processing. In this paper, we propose WISK, a learned index for spatial keyword queries, which self-adapts for optimizing querying costs given a query workload. One key challenge is how to utilize both structured spatial attributes and unstructured textual information during learning the index. We first divide the data objects into partitions, aiming to minimize the processing costs of the given query workload. We prove the NP-hardness of the partitioning problem and propose a machine learning model to find the optimal partitions. Then, to achieve more pruning power, we build a hierarchical structure based on the generated partitions in a bottom-up manner with a reinforcement learning-based approach. We conduct extensive experiments on real-world datasets and query workloads with various distributions, and the results show that WISK outperforms all competitors, achieving up to 8x speedup in querying time with comparable storage overhead.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/05/2018

Efficient Top K Temporal Spatial Keyword Search

Massive amount of data that are geo-tagged and associated with text info...
research
02/12/2021

Spatial Interpolation-based Learned Index for Range and kNN Queries

A corpus of recent work has revealed that the learned index can improve ...
research
06/08/2023

Learned spatial data partitioning

Due to the significant increase in the size of spatial data, it is essen...
research
09/08/2017

FAST: Frequency-Aware Spatio-Textual Indexing for In-Memory Continuous Filter Query Processing

Many applications need to process massive streams of spatio-textual data...
research
04/28/2018

QDR-Tree: An Efcient Index Scheme for Complex Spatial Keyword Query

With the popularity of mobile devices and the development of geo-positio...
research
01/29/2021

Distributed Spatial-Keyword kNN Monitoring for Location-aware Pub/Sub

Recent applications employ publish/subscribe (Pub/Sub) systems so that p...
research
03/25/2022

Navigable Proximity Graph-Driven Native Hybrid Queries with Structured and Unstructured Constraints

As research interest surges, vector similarity search is applied in mult...

Please sign up or login with your details

Forgot password? Click here to reset