High-Performance and Flexible Parallel Algorithms for Semisort and Related Problems

by   Xiaojun Dong, et al.

Semisort is a fundamental algorithmic primitive widely used in the design and analysis of efficient parallel algorithms. It takes input as an array of records and a function extracting a key per record, and reorders them so that records with equal keys are contiguous. Since many applications only require collecting equal values, but not fully sorting the input, semisort is broadly applicable, e.g., in string algorithms, graph analytics, and geometry processing, among many other domains. However, despite dozens of recent papers that use semisort in their theoretical analysis and the existence of an asymptotically optimal parallel semisort algorithm, most implementations of these parallel algorithms choose to implement semisort by using comparison or integer sorting in practice, due to potential performance issues in existing semisort implementations. In this paper, we revisit the semisort problem, with the goal of achieving a high-performance parallel semisort implementation with a flexible interface. Our approach can easily extend to two related problems, histogram and collect-reduce. Our algorithms achieve strong speedups in practice, and importantly, outperform state-of-the-art parallel sorting and semisorting methods for almost all settings we tested, with varying input sizes, distribution, and key types. We also test two important applications with real-world data, and show that our algorithms improve the performance over existing approaches. We believe that many other parallel algorithm implementations can be accelerated using our results.


page 2

page 14

page 15

page 16

page 17


Performance Evaluation of Parallel Sortings on the Supercomputer Fugaku

Sorting is one of the most basic algorithms, and developing highly paral...

Engineering In-place (Shared-memory) Sorting Algorithms

We present sorting algorithms that represent the fastest known technique...

LearnedSort as a learning-augmented SampleSort: Analysis and Parallelization

This work analyzes and parallelizes LearnedSort, the novel algorithm tha...

Vectorized and performance-portable Quicksort

Recent works showed that implementations of Quicksort using vector CPU i...

A study of integer sorting on multicores

Integer sorting on multicores and GPUs can be realized by a variety of a...

WiscSort: External Sorting For Byte-Addressable Storage

We present WiscSort, a new approach to high-performance concurrent sorti...

Designing a parallel suffix sort

Suffix sort plays a critical role in various computational algorithms in...

Please sign up or login with your details

Forgot password? Click here to reset