Finite-sample and asymptotic analysis of generalization ability with an application to penalized regression

09/12/2016
by   Ning Xu, et al.
0

In this paper, we study the performance of extremum estimators from the perspective of generalization ability (GA): the ability of a model to predict outcomes in new samples from the same population. By adapting the classical concentration inequalities, we derive upper bounds on the empirical out-of-sample prediction errors as a function of the in-sample errors, in-sample data size, heaviness in the tails of the error distribution, and model complexity. We show that the error bounds may be used for tuning key estimation hyper-parameters, such as the number of folds K in cross-validation. We also show how K affects the bias-variance trade-off for cross-validation. We demonstrate that the L_2-norm difference between penalized and the corresponding un-penalized regression estimates is directly explained by the GA of the estimates and the GA of empirical moment conditions. Lastly, we prove that all penalized regression estimates are L_2-consistent for both the n ≥ p and the n < p cases. Simulations are used to demonstrate key results. Keywords: generalization ability, upper bound of generalization error, penalized regression, cross-validation, bias-variance trade-off, L_2 difference between penalized and unpenalized regression, lasso, high-dimensional data.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/18/2016

Generalization error minimization: a new approach to model evaluation and selection with an application to penalized regression

We study model evaluation and model selection from the perspective of ge...
research
05/20/2017

( β, ϖ)-stability for cross-validation and the choice of the number of folds

In this paper, we introduce a new concept of stability for cross-validat...
research
03/28/2019

An analysis of the cost of hyper-parameter selection via split-sample validation, with applications to penalized regression

In the regression setting, given a set of hyper-parameters, a model-esti...
research
04/10/2021

Analytic and Bootstrap-after-Cross-Validation Methods for Selecting Penalty Parameters of High-Dimensional M-Estimators

We develop two new methods for selecting the penalty parameter for the ℓ...
research
07/30/2020

Rademacher upper bounds for cross-validation errors with an application to the lasso

We establish a general upper bound for K-fold cross-validation (K-CV) er...
research
06/26/2022

Prediction Errors for Penalized Regressions based on Generalized Approximate Message Passing

We discuss the prediction accuracy of assumed statistical models in term...
research
01/28/2019

On Random Subsampling of Gaussian Process Regression: A Graphon-Based Analysis

In this paper, we study random subsampling of Gaussian process regressio...

Please sign up or login with your details

Forgot password? Click here to reset