Variable importance in binary regression trees and forests

11/15/2007
by   Hemant Ishwaran, et al.
0

We characterize and study variable importance (VIMP) and pairwise variable associations in binary regression trees. A key component involves the node mean squared error for a quantity we refer to as a maximal subtree. The theory naturally extends from single trees to ensembles of trees and applies to methods like random forests. This is useful because while importance values from random forests are used to screen variables, for example they are used to filter high throughput genomic data in Bioinformatics, very little theory exists about their properties.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/04/2019

Fréchet random forests

Random forests are a statistical learning method widely used in many are...
research
07/07/2020

Estimation and Inference with Trees and Forests in High Dimensions

We analyze the finite sample mean squared error (MSE) performance of reg...
research
07/28/2014

Understanding Random Forests: From Theory to Practice

Data analysis and machine learning have become an integrative part of th...
research
10/27/2018

Dealing with Uncertain Inputs in Regression Trees

Tree-based ensemble methods, as Random Forests and Gradient Boosted Tree...
research
06/24/2019

Best Split Nodes for Regression Trees

Decision trees with binary splits are popularly constructed using Classi...
research
03/26/2020

From unbiased MDI Feature Importance to Explainable AI for Trees

We attempt to give a unifying view of the various recent attempts to (i)...
research
03/15/2017

Illuminant Estimation using Ensembles of Multivariate Regression Trees

White balancing is a fundamental step in the image processing pipeline. ...

Please sign up or login with your details

Forgot password? Click here to reset