Coverage-centric Coreset Selection for High Pruning Rates

by   Haizhong Zheng, et al.

One-shot coreset selection aims to select a subset of the training data, given a pruning rate, that can achieve high accuracy for models that are subsequently trained only with that subset. State-of-the-art coreset selection methods typically assign an importance score to each example and select the most important examples to form a coreset. These methods perform well at low pruning rates; but at high pruning rates, they have been found to suffer a catastrophic accuracy drop, performing worse than even random coreset selection. In this paper, we explore the reasons for this accuracy drop both theoretically and empirically. We extend previous theoretical results on the bound for model loss in terms of coverage provided by the coreset. Inspired by theoretical results, we propose a novel coverage-based metric and, based on the metric, find that coresets selected by importance-based coreset methods at high pruning rates can be expected to perform poorly compared to random coresets because of worse data coverage. We then propose a new coreset selection method, Coverage-centric Coreset Selection (CCS), where we jointly consider overall data coverage based on the proposed metric as well as importance of each example. We evaluate CCS on four datasets and show that they achieve significantly better accuracy than state-of-the-art coreset selection methods as well as random sampling under high pruning rates, and comparable performance at low pruning rates. For example, CCS achieves 7.04 random sampling and at least 20.16 selection methods on CIFAR10 with a 90


page 1

page 2

page 3

page 4


NLU on Data Diets: Dynamic Data Subset Selection for NLP Classification Tasks

Finetuning large language models inflates the costs of NLU applications ...

Repeated Random Sampling for Minimizing the Time-to-Accuracy of Learning

Methods for carefully selecting or generating a small set of training da...

Neural Network Pruning by Cooperative Coevolution

Neural network pruning is a popular model compression method which can s...

Pruning On-the-Fly: A Recoverable Pruning Method without Fine-tuning

Most existing pruning works are resource-intensive, requiring retraining...

Efficient Adversarial Training With Data Pruning

Neural networks are susceptible to adversarial examples-small input pert...

Importance Tempering: Group Robustness for Overparameterized Models

Although overparameterized models have shown their success on many machi...

Practical Network Acceleration with Tiny Sets: Hypothesis, Theory, and Algorithm

Due to data privacy issues, accelerating networks with tiny training set...

Please sign up or login with your details

Forgot password? Click here to reset