SNT: Sharpness-Minimizing Network Transformation for Fast Compression-friendly Pretraining

05/08/2023
by   Jung Hwan Heo, et al.
0

Model compression has become the de-facto approach for optimizing the efficiency of vision models. Recently, the focus of most compression efforts has shifted to post-training scenarios due to the very high cost of large-scale pretraining. This has created the need to build compressible models from scratch, which can effectively be compressed after training. In this work, we present a sharpness-minimizing network transformation (SNT) method applied during pretraining that can create models with desirable compressibility and generalizability features. We compare our approach to a well-known sharpness-minimizing optimizer to validate its efficacy in creating a flat loss landscape. To the best of our knowledge, SNT is the first pretraining method that uses an architectural transformation to generate compression-friendly networks. We find that SNT generalizes across different compression tasks and network backbones, delivering consistent improvements over the ADAM baseline with up to 2 improvement on quantization. Code to reproduce our results will be made publicly available.

READ FULL TEXT
research
09/19/2021

HPTQ: Hardware-Friendly Post Training Quantization

Neural network quantization enables the deployment of models on edge dev...
research
06/02/2016

Multi-pretrained Deep Neural Network

Pretraining is widely used in deep neutral network and one of the most f...
research
04/13/2022

METRO: Efficient Denoising Pretraining of Large Scale Autoencoding Language Models with Model Generated Signals

We present an efficient method of pretraining large-scale autoencoding l...
research
06/26/2023

ParameterNet: Parameters Are All You Need for Large-scale Visual Pretraining of Mobile Networks

The large-scale visual pretraining has significantly improve the perform...
research
06/03/2021

When Vision Transformers Outperform ResNets without Pretraining or Strong Data Augmentations

Vision Transformers (ViTs) and MLPs signal further efforts on replacing ...
research
02/19/2023

Why Is Public Pretraining Necessary for Private Model Training?

In the privacy-utility tradeoff of a model trained on benchmark language...
research
12/28/2020

Learning by Ignoring

Learning by ignoring, which identifies less important things and exclude...

Please sign up or login with your details

Forgot password? Click here to reset