Sketched Subspace Clustering

07/22/2017
by   Panagiotis A. Traganitis, et al.
0

The immense amount of daily generated and communicated data presents unique challenges in their processing. Clustering, the grouping of data without the presence of ground-truth labels, is an important tool for drawing inferences from data. Subspace clustering (SC) is a relatively recent method that is able to successfully classify nonlinearly separable data in a multitude of settings. In spite of their high clustering accuracy, SC methods incur prohibitively high computational complexity when processing large volumes of high-dimensional data. Inspired by random sketching approaches for dimensionality reduction, the present paper introduces a randomized scheme for SC, termed Sketch-SC, tailored for large volumes of high-dimensional data. Sketch-SC accelerates the computationally heavy parts of state-of-the-art SC approaches by compressing the data matrix across both dimensions using random projections, thus enabling fast and accurate large-scale SC. Performance analysis as well as extensive numerical tests on real data corroborate the potential of Sketch-SC and its competitive performance relative to state-of-the-art scalable SC approaches.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/06/2015

Large-scale subspace clustering using sketching and validation

The nowadays massive amounts of generated and communicated data present ...
research
01/22/2015

Sketch and Validate for Big Data Clustering

In response to the need for learning tools tuned to big data analytics, ...
research
09/18/2020

Probabilistically Sampled and Spectrally Clustered Plant Genotypes using Phenotypic Characteristics

Clustering genotypes based upon their phenotypic characteristics is used...
research
10/27/2022

Clustering High-dimensional Data via Feature Selection

High-dimensional clustering analysis is a challenging problem in statist...
research
03/30/2018

Fast and Robust Subspace Clustering Using Random Projections

Over the past several decades, subspace clustering has been receiving in...
research
04/06/2017

DIMM-SC: A Dirichlet mixture model for clustering droplet-based single cell transcriptomic data

Motivation: Single cell transcriptome sequencing (scRNA-Seq) has become ...
research
10/15/2020

Selective Classification via One-Sided Prediction

We propose a novel method for selective classification (SC), a problem w...

Please sign up or login with your details

Forgot password? Click here to reset