Adapting and Evaluating Influence-Estimation Methods for Gradient-Boosted Decision Trees

04/30/2022
by   Jonathan Brophy, et al.
0

Influence estimation analyzes how changes to the training data can lead to different model predictions; this analysis can help us better understand these predictions, the models making those predictions, and the data sets they're trained on. However, most influence-estimation techniques are designed for deep learning models with continuous parameters. Gradient-boosted decision trees (GBDTs) are a powerful and widely-used class of models; however, these models are black boxes with opaque decision-making processes. In the pursuit of better understanding GBDT predictions and generally improving these models, we adapt recent and popular influence-estimation methods designed for deep learning models to GBDTs. Specifically, we adapt representer-point methods and TracIn, denoting our new methods TREX and BoostIn, respectively; source code is available at https://github.com/jjbrophy47/tree_influence. We compare these methods to LeafInfluence and other baselines using 5 different evaluation measures on 22 real-world data sets with 4 popular GBDT implementations. These experiments give us a comprehensive overview of how different approaches to influence estimation work in GBDT models. We find BoostIn is an efficient influence-estimation method for GBDTs that performs equally well or better than existing work while being four orders of magnitude faster. Our evaluation also suggests the gold-standard approach of leave-one-out (LOO) retraining consistently identifies the single-most influential training example but performs poorly at finding the most influential set of training examples for a given target prediction.

READ FULL TEXT

page 17

page 31

research
12/02/2019

Proving Data-Poisoning Robustness in Decision Trees

Machine learning models are brittle, and small changes in the training d...
research
12/09/2022

Training Data Influence Analysis and Estimation: A Survey

Good models require good training data. For overparameterized deep model...
research
04/01/2020

NBDT: Neural-Backed Decision Trees

Deep learning is being adopted in settings where accurate and justifiabl...
research
12/06/2021

Scaling Up Influence Functions

We address efficient calculation of influence functions for tracking pre...
research
01/25/2022

Identifying a Training-Set Attack's Target Using Renormalized Influence Estimation

Targeted training-set attacks inject malicious instances into the traini...
research
08/28/2019

Facial age estimation by deep residual decision making

Residual representation learning simplifies the optimization problem of ...
research
03/22/2023

Revisiting the Fragility of Influence Functions

In the last few years, many works have tried to explain the predictions ...

Please sign up or login with your details

Forgot password? Click here to reset