Confidence Intervals for Random Forests: The Jackknife and the Infinitesimal Jackknife

11/18/2013
by   Stefan Wager, et al.
0

We study the variability of predictions made by bagged learners and random forests, and show how to estimate standard errors for these methods. Our work builds on variance estimates for bagging proposed by Efron (1992, 2012) that are based on the jackknife and the infinitesimal jackknife (IJ). In practice, bagged predictors are computed using a finite number B of bootstrap replicates, and working with a large B can be computationally expensive. Direct applications of jackknife and IJ estimators to bagging require B on the order of n^1.5 bootstrap replicates to converge, where n is the size of the training set. We propose improved versions that only require B on the order of n replicates. Moreover, we show that the IJ estimator requires 1.7 times less bootstrap replicates than the jackknife to achieve a given accuracy. Finally, we study the sampling distributions of the jackknife and IJ variance estimates themselves. We illustrate our findings with multiple experiments and simulation studies.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/26/2022

Confidence Intervals for the Generalisation Error of Random Forests

Out-of-bag error is commonly used as an estimate of generalisation error...
research
04/14/2018

A bootstrap analysis for finite populations

Bootstrap methods are increasingly accepted as one of the common approac...
research
10/24/2017

Estimating the Operating Characteristics of Ensemble Methods

In this paper we present a technique for using the bootstrap to estimate...
research
08/08/2017

Exponential Random Graph Models with Big Networks: Maximum Pseudolikelihood Estimation and the Parametric Bootstrap

With the growth of interest in network data across fields, the Exponenti...
research
05/04/2022

Multivariate Prediction Intervals for Random Forests

Accurate uncertainty estimates can significantly improve the performance...
research
04/25/2014

Quantifying Uncertainty in Random Forests via Confidence Intervals and Hypothesis Tests

This work develops formal statistical inference procedures for machine l...
research
04/20/2020

Inference by Stochastic Optimization: A Free-Lunch Bootstrap

Assessing sampling uncertainty in extremum estimation can be challenging...

Please sign up or login with your details

Forgot password? Click here to reset