Parallel Filtered Graphs for Hierarchical Clustering

by   Shangdi Yu, et al.

Given all pairwise weights (distances) among a set of objects, filtered graphs provide a sparse representation by only keeping an important subset of weights. Such graphs can be passed to graph clustering algorithms to generate hierarchical clusters. In particular, the directed bubble hierarchical tree (DBHT) algorithm on filtered graphs has been shown to produce good hierarchical clusters for time series data. We propose a new parallel algorithm for constructing triangulated maximally filtered graphs (TMFG), which produces valid inputs for DBHT, and a scalable parallel algorithm for generating DBHTs that is optimized for TMFG inputs. In addition to parallelizing the original TMFG construction, which has limited parallelism, we also design a new algorithm that inserts multiple vertices on each round to enable more parallelism. We show that the graphs generated by our new algorithm have similar quality compared to the original TMFGs, while being much faster to generate. Our new parallel algorithms for TMFGs and DBHTs are 136–2483x faster than state-of-the-art implementations, while achieving up to 41.56x self-relative speedup on 48 cores with hyper-threading, and achieve better clustering results compared to the standard average-linkage and complete-linkage hierarchical clustering algorithms. We show that on a stock data set, our algorithms produce clusters that align well with human experts' classification.


page 12

page 15


ParChain: A Framework for Parallel Hierarchical Agglomerative Clustering using Nearest-Neighbor Chain

This paper studies the hierarchical clustering problem, where the goal i...

Faster Parallel Exact Density Peaks Clustering

Clustering multidimensional points is a fundamental data mining task, wi...

Parallel and Scalable Precise Clustering for Homologous Protein Discovery

This paper presents a new, parallel implementation of clustering and dem...

Hierarchical Clustering: O(1)-Approximation for Well-Clustered Graphs

Hierarchical clustering studies a recursive partition of a data set into...

Parallel Strong Connectivity Based on Faster Reachability

Computing strongly connected components (SCC) is a fundamental problems ...

BatchLayout: A Batch-Parallel Force-Directed Graph Layout Algorithm in Shared Memory

Force-directed algorithms are widely used to generate aesthetically plea...

A Weight-based Information Filtration Algorithm for Stock-Correlation Networks

Several algorithms have been proposed to filter information on a complet...

Please sign up or login with your details

Forgot password? Click here to reset