Generalization error of random features and kernel methods: hypercontractivity and kernel matrix concentration

01/26/2021
by   Song Mei, et al.
0

Consider the classical supervised learning problem: we are given data (y_i, x_i), i≤ n, with y_i a response and x_i∈𝒳 a covariates vector, and try to learn a model f:𝒳→ℝ to predict future responses. Random features methods map the covariates vector x_i to a point ϕ( x_i) in a higher dimensional space ℝ^N, via a random featurization map ϕ. We study the use of random features methods in conjunction with ridge regression in the feature space ℝ^N. This can be viewed as a finite-dimensional approximation of kernel ridge regression (KRR), or as a stylized model for neural networks in the so called lazy training regime. We define a class of problems satisfying certain spectral conditions on the underlying kernels, and a hypercontractivity assumption on the associated eigenfunctions. These conditions are verified by classical high-dimensional examples. Under these conditions, we prove a sharp characterization of the error of random features ridge regression. In particular, we address two fundamental questions: (1) What is the generalization error of KRR? (2) How big N should be for the random features approximation to achieve the same error as KRR? In this setting, we prove that KRR is well approximated by a projection onto the top ℓ eigenfunctions of the kernel, where ℓ depends on the sample size n. We show that the test error of random features ridge regression is dominated by its approximation error and is larger than the error of KRR as long as N≤ n^1-δ for some δ>0. We characterize this gap. For N≥ n^1+δ, random features achieve the same error as the corresponding KRR, and further increasing N does not lead to a significant change in test error.

READ FULL TEXT
research
04/26/2018

Random Fourier Features for Kernel Ridge Regression: Approximation Bounds and Statistical Guarantees

Random Fourier features is one of the most popular techniques for scalin...
research
10/16/2022

Dimension free ridge regression

Random matrix theory has become a widely useful tool in high-dimensional...
research
02/22/2019

Spatial Analysis Made Easy with Linear Regression and Kernels

Kernel methods are an incredibly popular technique for extending linear ...
research
12/22/2020

Improving Sample and Feature Selection with Principal Covariates Regression

Selecting the most relevant features and samples out of a large set of c...
research
06/22/2023

An Agnostic View on the Cost of Overfitting in (Kernel) Ridge Regression

We study the cost of overfitting in noisy kernel ridge regression (KRR),...
research
02/07/2022

Failure and success of the spectral bias prediction for Kernel Ridge Regression: the case of low-dimensional data

Recently, several theories including the replica method made predictions...
research
05/26/2023

Error Bounds for Learning with Vector-Valued Random Features

This paper provides a comprehensive error analysis of learning with vect...

Please sign up or login with your details

Forgot password? Click here to reset