Hyperparameter Selection for Subsampling Bootstraps

by   Yingying Ma, et al.

Massive data analysis becomes increasingly prevalent, subsampling methods like BLB (Bag of Little Bootstraps) serves as powerful tools for assessing the quality of estimators for massive data. However, the performance of the subsampling methods are highly influenced by the selection of tuning parameters ( e.g., the subset size, number of resamples per subset ). In this article we develop a hyperparameter selection methodology, which can be used to select tuning parameters for subsampling methods. Specifically, by a careful theoretical analysis, we find an analytically simple and elegant relationship between the asymptotic efficiency of various subsampling estimators and their hyperparameters. This leads to an optimal choice of the hyperparameters. More specifically, for an arbitrarily specified hyperparameter set, we can improve it to be a new set of hyperparameters with no extra CPU time cost, but the resulting estimator's statistical efficiency can be much improved. Both simulation studies and real data analysis demonstrate the superior advantage of our method.


page 1

page 2

page 3

page 4


Optimal Subsampling Bootstrap for Massive Data

The bootstrap is a widely used procedure for statistical inference becau...

A Scalable Bootstrap for Massive Data

The bootstrap provides a simple and powerful means of assessing the qual...

JITuNE: Just-In-Time Hyperparameter Tuning for Network Embedding Algorithms

Network embedding (NE) can generate succinct node representations for ma...

Discrete Simulation Optimization for Tuning Machine Learning Method Hyperparameters

Machine learning methods are being increasingly used in most technical a...

The Big Data Bootstrap

The bootstrap provides a simple and powerful means of assessing the qual...

How much progress have we made in neural network training? A New Evaluation Protocol for Benchmarking Optimizers

Many optimizers have been proposed for training deep neural networks, an...

What is the best predictor that you can compute in five minutes using a given Bayesian hierarchical model?

The goal of this paper is to provide a way for statisticians to answer t...

Please sign up or login with your details

Forgot password? Click here to reset