Spatial Analysis Made Easy with Linear Regression and Kernels
Kernel methods are an incredibly popular technique for extending linear models to non-linear problems via a mapping to an implicit, high-dimensional feature space. While kernel methods are computationally cheaper than an explicit feature mapping, they are still subject to cubic cost on the number of points. Given only a few thousand locations, this computational cost rapidly outstrips the currently available computational power. This paper aims to provide an overview of kernel methods from first-principals (with a focus on ridge regression), before progressing to a review of random Fourier features (RFF), a set of methods that enable the scaling of kernel methods to big datasets. At each stage, the associated R code is provided. We begin by illustrating how the dual representation of ridge regression relies solely on inner products and permits the use of kernels to map the data into high-dimensional spaces. We progress to RFFs, showing how only a few lines of code provides a significant computational speed-up for a negligible cost to accuracy. We provide an example of the implementation of RFFs on a simulated spatial data set to illustrate these properties. Lastly, we summarise the main issues with RFFs and highlight some of the advanced techniques aimed at alleviating them.
READ FULL TEXT