Towards Efficient Task-Driven Model Reprogramming with Foundation Models

by   Shoukai Xu, et al.

Vision foundation models exhibit impressive power, benefiting from the extremely large model capacity and broad training data. However, in practice, downstream scenarios may only support a small model due to the limited computational resources or efficiency considerations. Moreover, the data used for pretraining foundation models are usually invisible and very different from the target data of downstream tasks. This brings a critical challenge for the real-world application of foundation models: one has to transfer the knowledge of a foundation model to the downstream task that has a quite different architecture with only downstream target data. Existing transfer learning or knowledge distillation methods depend on either the same model structure or finetuning of the foundation model. Thus, naively introducing these methods can be either infeasible or very inefficient. To address this, we propose a Task-Driven Model Reprogramming (TDMR) framework. Specifically, we reprogram the foundation model to project the knowledge into a proxy space, which alleviates the adverse effect of task mismatch and domain inconsistency. Then, we reprogram the target model via progressive distillation from the proxy space to efficiently learn the knowledge from the reprogrammed foundation model. TDMR is compatible with different pre-trained model types (CNN, transformer or their mix) and limited target data, and promotes the wide applications of vision foundation models to downstream tasks in a cost-effective manner. Extensive experiments on different downstream classification tasks and target model structures demonstrate the effectiveness of our methods with both CNNs and transformer foundation models.


page 4

page 12


One to Transfer All: A Universal Transfer Framework for Vision Foundation Model with Few Data

The foundation model is not the last chapter of the model production pip...

A Billion-scale Foundation Model for Remote Sensing Images

As the potential of foundation models in visual tasks has garnered signi...

ViNT: A Foundation Model for Visual Navigation

General-purpose pre-trained models ("foundation models") have enabled pr...

DIME-FM: DIstilling Multimodal and Efficient Foundation Models

Large Vision-Language Foundation Models (VLFM), such as CLIP, ALIGN and ...

Molecular Structure-Property Co-Trained Foundation Model for In Silico Chemistry

Recently, deep learning approaches have been extensively studied for var...

TaCA: Upgrading Your Visual Foundation Model with Task-agnostic Compatible Adapter

Visual foundation models like CLIP excel in learning feature representat...

MS-DINO: Efficient Distributed Training of Vision Transformer Foundation Model in Medical Domain through Masked Sampling

In spite of the recent success of deep learning in the medical domain, t...

Please sign up or login with your details

Forgot password? Click here to reset