Statistical Consistency of Kernel PCA with Random Features
Kernel methods are powerful learning methodologies that provide a simple way to construct nonlinear algorithms from linear ones. Despite their popularity, they suffer from poor scalability in big data scenarios. Various approximation methods, including random feature approximation have been proposed to alleviate the problem. However, the statistical consistency of most of these approximate kernel methods are not well understood except for kernel ridge regression wherein it has been shown that the random feature approximation is not only computationally efficient but also statistically consistent with a minimax optimal rate of convergence. In this paper, we investigate the efficacy of random feature approximation in the context of kernel principal component analysis (KPCA) by studying the statistical behavior of approximate KPCA. We show that the approximate KPCA is either computationally efficient or statistically efficient (i.e., achieves the same convergence rate as that of KPCA) but not both. This means, in the context of KPCA, the random feature approximation provides computational efficiency at the cost of statistical efficiency.
READ FULL TEXT