Reinforce Data, Multiply Impact: Improved Model Accuracy and Robustness with Dataset Reinforcement
We propose Dataset Reinforcement, a strategy to improve a dataset once such that the accuracy of any model architecture trained on the reinforced dataset is improved at no additional training cost for users. We propose a Dataset Reinforcement strategy based on data augmentation and knowledge distillation. Our generic strategy is designed based on extensive analysis across CNN- and transformer-based models and performing large-scale study of distillation with state-of-the-art models with various data augmentations. We create a reinforced version of the ImageNet training dataset, called ImageNet+, as well as reinforced datasets CIFAR-100+, Flowers-102+, and Food-101+. Models trained with ImageNet+ are more accurate, robust, and calibrated, and transfer well to downstream tasks (e.g., segmentation and detection). As an example, the accuracy of ResNet-50 improves by 1.7 ImageNetV2, and 10.0 ImageNet validation set is also reduced by 9.9 Mask-RCNN for object detection on MS-COCO, the mean average precision improves by 0.8 For MobileNetV3 and Swin-Tiny we observe significant improvements on ImageNet-R/A/C of up to 10 and fine-tuned on CIFAR-100+, Flowers-102+, and Food-101+, reach up to 3.4 improved accuracy.
READ FULL TEXT