Technical Report: KNN Joins Using a Hybrid Approach: Exploiting CPU/GPU Workload Characteristics

10/10/2018
by   Michael Gowanlock, et al.
0

This paper studies finding the K nearest neighbors (KNN) of all points in a dataset. Typical solutions to KNN searches use indexing to prune the search, which reduces the number of candidate points that may be within the set of the nearest K points of each query point. In high dimensionality, index searches degrade, making the KNN self-join a prohibitively expensive operation in some scenarios. Furthermore, there are a significant number of distance calculations needed to determine which points are nearest to each query point. To address these challenges, we propose a hybrid CPU/GPU approach. Since the CPU and GPU are considerably different architectures that are best exploited using different algorithms, we advocate for splitting the work between both architectures based on the characteristic workloads defined by the query points in the dataset. As such, we assign dense regions to the GPU, and sparse regions to the CPU to most efficiently exploit the relative strengths of each architecture. Critically, we find that the relative performance gains over the reference implementation across four real-world datasets are a function of the data properties (size, dimensionality, distribution), and number of neighbors, K.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset