Scalable GWR: A linear-time algorithm for large-scale geographically weighted regression with polynomial kernels
While a number of studies have developed fast geographically weighted regression (GWR) algorithms for large samples, none of them achieves the linear-time estimation that is considered requisite for big data analysis in machine learning, geostatistics, and related domains. Against this backdrop, this study proposes a scalable GWR (ScaGWR) for large datasets. The key development is the calibration of the model through a pre-compression of the matrices and vectors whose size depends on the sample size, prior to the execution of leave-one-out cross-validation (LOOCV) that is the heaviest computational step in conventional GWR. This pre-compression allows us to run the proposed GWR extension such that its computation time increases linearly with sample size, whereas conventional GWR algorithms take at most quad-quadratic-order time. With this development, the ScaGWR can be calibrated with more than one million samples without parallelization. Moreover, the ScaGWR estimator can be regarded as an empirical Bayesian estimator that is more stable than the conventional GWR estimator. This study compared the ScaGWR with the conventional GWR in terms of estimation accuracy, predictive accuracy, and computational efficiency using a Monte Carlo simulation. Then, we apply these methods to a residential land analysis in the Tokyo Metropolitan Area. The code for ScaGWR is available in the R package scgwr, and is going to be incorporated into another R package, GWmodel.
READ FULL TEXT