Adjusted chi-square test for degree-corrected block models

by   Linfan Zhang, et al.

We propose a goodness-of-fit test for degree-corrected stochastic block models (DCSBM). The test is based on an adjusted chi-square statistic for measuring equality of means among groups of n multinomial distributions with d_1,…,d_n observations. In the context of network models, the number of multinomials, n, grows much faster than the number of observations, d_i, hence the setting deviates from classical asymptotics. We show that a simple adjustment allows the statistic to converge in distribution, under null, as long as the harmonic mean of {d_i} grows to infinity. This result applies to large sparse networks where the role of d_i is played by the degree of node i. Our distributional results are nonasymptotic, with explicit constants, providing finite-sample bounds on the Kolmogorov-Smirnov distance to the target distribution. When applied sequentially, the test can also be used to determine the number of communities. The test operates on a (row) compressed version of the adjacency matrix, conditional on the degrees, and as a result is highly scalable to large sparse networks. We incorporate a novel idea of compressing the columns based on a (K+1)-community assignment when testing for K communities. This approach increases the power in sequential applications without sacrificing computational efficiency, and we prove its consistency in recovering the number of communities. Since the test statistic does not rely on a specific alternative, its utility goes beyond sequential testing and can be used to simultaneously test against a wide range of alternatives outside the DCSBM family. We show the effectiveness of the approach by extensive numerical experiments with simulated and real data. In particular, applying the test to the Facebook-100 dataset, we find that a DCSBM with a small number of communities is far from a good fit in almost all cases.


A spectral based goodness-of-fit test for stochastic block models

Community detection in complex networks has attracted considerable atten...

Determining the Number of Communities in Degree-corrected Stochastic Block Models

We propose to estimate the number of communities in degree-corrected sto...

Two-Sample High Dimensional Mean Test Based On Prepivots

Testing equality of mean vectors is a very commonly used criterion when ...

Two-Sample Test for Stochastic Block Models via Maximum Entry-wise Deviation

Stochastic block model is a popular tool for detecting community structu...

Testing for Global Network Structure Using Small Subgraph Statistics

We study the problem of testing for community structure in networks usin...

Multiple Hypothesis Testing To Estimate The Number of Communities in Sparse Stochastic Block Models

Network-based clustering methods frequently require the number of commun...

Variational Estimators of the Degree-corrected Latent Block Model for Bipartite Networks

Biclustering on bipartite graphs is an unsupervised learning task that s...

Please sign up or login with your details

Forgot password? Click here to reset