Bootstrapping the Cross-Validation Estimate

07/01/2023
by   Bryan Cai, et al.
0

Cross-validation is a widely used technique for evaluating the performance of prediction models. It helps avoid the optimism bias in error estimates, which can be significant for models built using complex statistical learning algorithms. However, since the cross-validation estimate is a random value dependent on observed data, it is essential to accurately quantify the uncertainty associated with the estimate. This is especially important when comparing the performance of two models using cross-validation, as one must determine whether differences in error estimates are a result of chance fluctuations. Although various methods have been developed for making inferences on cross-validation estimates, they often have many limitations, such as stringent model assumptions This paper proposes a fast bootstrap method that quickly estimates the standard error of the cross-validation estimate and produces valid confidence intervals for a population parameter measuring average model performance. Our method overcomes the computational challenge inherent in bootstrapping the cross-validation estimate by estimating the variance component within a random effects model. It is just as flexible as the cross-validation procedure itself. To showcase the effectiveness of our approach, we employ comprehensive simulations and real data analysis across three diverse applications.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/01/2021

Cross-validation: what does it estimate and how well does it do it?

Cross-validation is a widely-used technique to estimate prediction error...
research
01/25/2019

Rescaling and other forms of unsupervised preprocessing introduce bias into cross-validation

Cross-validation of predictive models is the de-facto standard for model...
research
03/09/2016

Computing AIC for black-box models using Generalised Degrees of Freedom: a comparison with cross-validation

Generalised Degrees of Freedom (GDF), as defined by Ye (1998 JASA 93:120...
research
12/29/2021

Application of the Pythagorean Expected Wins Percentage and Cross-Validation Methods in Estimating Team Quality

The Pythagorean Expected Wins Percentage Model was developed by Bill Jam...
research
04/25/2021

Model-based metrics: Sample-efficient estimates of predictive model subpopulation performance

Machine learning models - now commonly developed to screen, diagnose, or...
research
05/28/2020

Estimating the Prediction Performance of Spatial Models via Spatial k-Fold Cross Validation

In machine learning one often assumes the data are independent when eval...
research
01/09/2018

Test Error Estimation after Model Selection Using Validation Error

When performing supervised learning with the model selected using valida...

Please sign up or login with your details

Forgot password? Click here to reset