Clustering US States by Time Series of COVID-19 New Case Counts with Non-negative Matrix Factorization
The spreading pattern of COVID-19 differ a lot across the US states under different quarantine measures and reopening policies. We proposed to cluster the US states into distinct communities based on the daily new confirmed case counts via a nonnegative matrix factorization (NMF) followed by a k-means clustering procedure on the coefficients of the NMF basis. A cross-validation method was employed to select the rank of the NMF. Applying the method to the entire study period from March 22 to July 25, we clustered the 49 continental states (including District of Columbia) into 7 groups, two of which contained a single state. To investigate the dynamics of the clustering results over time, the same method was successively applied to the time periods with increment of one week, starting from the period of March 22 to March 28. The results suggested a change point in the clustering in the week starting on May 30, which might be explained by a combined impact of both quarantine measures and reopening policies.
READ FULL TEXT