Scalable and Robust Sparse Subspace Clustering Using Randomized Clustering and Multilayer Graphs

02/21/2018
by   Maryam Abdolali, et al.
0

Sparse subspace clustering (SSC) is one of the current state-of-the-art method for partitioning data points into the union of subspaces, with strong theoretical guarantees. However, it is not practical for large data sets as it requires solving a LASSO problem for each data point, where the number of variables in each LASSO problem is the number of data points. To improve the scalability of SSC, we propose to select a few sets of anchor points using a randomized hierarchical clustering method, and, for each set of anchor points, solve the LASSO problems for each data point allowing only anchor points to have a non-zero weight (this reduces drastically the number of variables). This generates a multilayer graph where each layer corresponds to a different set of anchor points. Using the Grassmann manifold of orthogonal matrices, the shared connectivity among the layers is summarized within a single subspace. Finally, we use k-means clustering within that subspace to cluster the data points, similarly as done by spectral clustering in SSC. We show on both synthetic and real-world data sets that the proposed method not only allows SSC to scale to large-scale data sets, but that it is also much more robust as it performs significantly better on noisy data and on data with close susbspaces and outliers, while it is not prone to oversegmentation.

READ FULL TEXT

page 15

page 20

research
04/04/2015

Graph Connectivity in Noisy Sparse Subspace Clustering

Subspace clustering is the problem of clustering data points into a unio...
research
05/25/2021

Scaling Hierarchical Agglomerative Clustering to Billion-sized Datasets

Hierarchical Agglomerative Clustering (HAC) is one of the oldest but sti...
research
01/24/2015

Consistency Analysis of Nearest Subspace Classifier

The Nearest subspace classifier (NSS) finds an estimation of the underly...
research
09/23/2022

Creating Compact Regions of Social Determinants of Health

Regionalization is the act of breaking a dataset into contiguous homogen...
research
03/15/2022

Scalable Bigraphical Lasso: Two-way Sparse Network Inference for Count Data

Classically, statistical datasets have a larger number of data points th...
research
10/18/2018

Accurate and Scalable Image Clustering Based On Sparse Representation of Camera Fingerprint

Clustering images according to their acquisition devices is a well-known...
research
04/03/2019

Learning for Multi-Type Subspace Clustering

Subspace clustering has been extensively studied from the hypothesis-and...

Please sign up or login with your details

Forgot password? Click here to reset