Faster Parallel Exact Density Peaks Clustering

by   Yihao Huang, et al.

Clustering multidimensional points is a fundamental data mining task, with applications in many fields, such as astronomy, neuroscience, bioinformatics, and computer vision. The goal of clustering algorithms is to group similar objects together. Density-based clustering is a clustering approach that defines clusters as dense regions of points. It has the advantage of being able to detect clusters of arbitrary shapes, rendering it useful in many applications. In this paper, we propose fast parallel algorithms for Density Peaks Clustering (DPC), a popular version of density-based clustering. Existing exact DPC algorithms suffer from low parallelism both in theory and in practice, which limits their application to large-scale data sets. Our most performant algorithm, which is based on priority search kd-trees, achieves O(log nloglog n) span (parallel time complexity) for a data set of n points. Our algorithm is also work-efficient, achieving a work complexity matching the best existing sequential exact DPC algorithm. In addition, we present another DPC algorithm based on a Fenwick tree that makes fewer assumptions for its average-case complexity to hold. We provide optimized implementations of our algorithms and evaluate their performance via extensive experiments. On a 30-core machine with two-way hyperthreading, we find that our best algorithm achieves a 10.8–13169x speedup over the previous best parallel exact DPC algorithm. Compared to the state-of-the-art parallel approximate DPC algorithm, our best algorithm achieves a 1.5–4206x speedup, while being exact.


page 1

page 2

page 3

page 4


HCA-DBSCAN: HyperCube Accelerated Density Based Spatial Clustering for Applications with Noise

Density-based clustering has found numerous applications across various ...

A parallel algorithm for Delaunay triangulation of moving points on the plane

Delaunay Triangulation(DT) is one of the important geometric problems th...

Parallel Filtered Graphs for Hierarchical Clustering

Given all pairwise weights (distances) among a set of objects, filtered ...

Clustering via Boundary Erosion

Clustering analysis identifies samples as groups based on either their m...

Fast tree-based algorithms for DBSCAN on GPUs

DBSCAN is a well-known density-based clustering algorithm to discover cl...

FINEX: A Fast Index for Exact Flexible Density-Based Clustering (Extended Version with Proofs)*

Density-based clustering aims to find groups of similar objects (i.e., c...

Please sign up or login with your details

Forgot password? Click here to reset