Optimal Resampling for Learning Small Models

05/04/2019
by   Abhishek Ghose, et al.
26

Models often need to be constrained to a certain size for them to be considered interpretable, for e.g., a decision tree of depth 5 is much easier to make sense of than one of depth 30. This suggests a trade-off between interpretability and accuracy. Our work tries to minimize this trade-off by suggesting the optimal distribution of the data to learn from, that surprisingly, may be different from the original distribution. We use an Infinite Beta Mixture Model (IBMM) to represent a specific set of sampling schemes. The parameters of the IBMM are learned using a Bayesian Optimizer (BO). While even under simplistic assumptions a distribution in the original d-dimensional space would need to optimize for O(d) variables - cumbersome for most real-world data - our technique lowers this number significantly to a fixed set of 8 variables at the cost of some additional preprocessing. The proposed technique is model-agnostic; it can be applied to any classifier. It also admits a general notion of model size. We demonstrate its effectiveness using multiple real-world datasets to construct decision trees, linear probability models and gradient boosted models.

READ FULL TEXT
research
06/17/2019

Learning Interpretable Models Using an Oracle

As Machine Learning (ML) becomes pervasive in various real world systems...
research
05/26/2023

Improving Stability in Decision Tree Models

Owing to their inherently interpretable structure, decision trees are co...
research
01/30/2023

Optimal Decision Tree Policies for Markov Decision Processes

Interpretability of reinforcement learning policies is essential for man...
research
07/02/2021

Decision tree heuristics can fail, even in the smoothed setting

Greedy decision tree learning heuristics are mainstays of machine learni...
research
10/08/2022

Accurate Small Models using Adaptive Sampling

We highlight the utility of a certain property of model training: instea...
research
03/05/2021

Efficient Encrypted Inference on Ensembles of Decision Trees

Data privacy concerns often prevent the use of cloud-based machine learn...

Please sign up or login with your details

Forgot password? Click here to reset