Clustering Without an Eigengap

08/29/2023
by   Matthew Zurek, et al.
0

We study graph clustering in the Stochastic Block Model (SBM) in the presence of both large clusters and small, unrecoverable clusters. Previous approaches achieving exact recovery do not allow any small clusters of size o(√(n)), or require a size gap between the smallest recovered cluster and the largest non-recovered cluster. We provide an algorithm based on semidefinite programming (SDP) which removes these requirements and provably recovers large clusters regardless of the remaining cluster sizes. Mid-sized clusters pose unique challenges to the analysis, since their proximity to the recovery threshold makes them highly sensitive to small noise perturbations and precludes a closed-form candidate solution. We develop novel techniques, including a leave-one-out-style argument which controls the correlation between SDP solutions and noise vectors even when the removal of one row of noise can drastically change the SDP solution. We also develop improved eigenvalue perturbation bounds of potential independent interest. Using our gap-free clustering procedure, we obtain efficient algorithms for the problem of clustering with a faulty oracle with superior query complexities, notably achieving o(n^2) sample complexity even in the presence of a large number of small clusters. Our gap-free clustering procedure also leads to improved algorithms for recursive clustering. Our results extend to certain heterogeneous probability settings that are challenging for alternative algorithms.

READ FULL TEXT
research
02/26/2015

Achieving Exact Cluster Recovery Threshold via Semidefinite Programming: Extensions

Resolving a conjecture of Abbe, Bandeira and Hall, the authors have rece...
research
06/15/2018

Query K-means Clustering and the Double Dixie Cup Problem

We consider the problem of approximate K-means clustering with outliers ...
research
03/16/2016

Clustering of Sparse and Approximately Sparse Graphs by Semidefinite Programming

As a model problem for clustering, we consider the densest k-disjoint-cl...
research
07/08/2015

Multisection in the Stochastic Block Model using Semidefinite Programming

We consider the problem of identifying underlying community-like structu...
research
02/19/2013

Breaking the Small Cluster Barrier of Graph Clustering

This paper investigates graph clustering in the planted cluster model in...
research
10/11/2012

Improved Graph Clustering

Graph clustering involves the task of dividing nodes into clusters, so t...
research
08/10/2021

Correlation Clustering Reconstruction in Semi-Adversarial Models

Correlation Clustering is an important clustering problem with many appl...

Please sign up or login with your details

Forgot password? Click here to reset