Coresets for Decision Trees of Signals

10/07/2021
by   Ibrahim Jubran, et al.
0

A k-decision tree t (or k-tree) is a recursive partition of a matrix (2D-signal) into k≥ 1 block matrices (axis-parallel rectangles, leaves) where each rectangle is assigned a real label. Its regression or classification loss to a given matrix D of N entries (labels) is the sum of squared differences over every label in D and its assigned label by t. Given an error parameter ε∈(0,1), a (k,ε)-coreset C of D is a small summarization that provably approximates this loss to every such tree, up to a multiplicative factor of 1±ε. In particular, the optimal k-tree of C is a (1+ε)-approximation to the optimal k-tree of D. We provide the first algorithm that outputs such a (k,ε)-coreset for every such matrix D. The size |C| of the coreset is polynomial in klog(N)/ε, and its construction takes O(Nk) time. This is by forging a link between decision trees from machine learning – to partition trees in computational geometry. Experimental results on and show that applying our coresets on real-world data-sets boosts the computation time of random forests and their parameter tuning by up to x10, while keeping similar accuracy. Full open source code is provided.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset