Improved Visual Fine-tuning with Natural Language Supervision

04/04/2023
by   Junyang Wang, et al.
0

Fine-tuning a pre-trained model can leverage the semantic information from large-scale pre-training data and mitigate the over-fitting problem on downstream tasks with limited training examples. While the problem of catastrophic forgetting in backbone has been extensively studied, the potential bias existing in a pre-trained model due to the corresponding pre-training task and data, attracts less attention. In this work, we investigate this problem by demonstrating that the obtained classifier after fine-tuning will be close to that induced by the pre-trained model. To reduce the bias in the classifier effectively, we introduce a reference distribution obtained from a fixed text classifier, which can help regularize the learned vision classifier. The proposed method, Text Supervised fine-tuning (TeS), is evaluated with diverse pre-trained vision models including ResNet and ViT, and text encoders including BERT and CLIP, on 11 downstream tasks. The consistent improvement with a clear margin over distinct scenarios confirms the effectiveness of our proposal.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset

Sign in with Google

×

Use your Google Account to sign in to DeepAI

×

Consider DeepAI Pro