Bi-tuning of Pre-trained Representations

by   Jincheng Zhong, et al.

It is common within the deep learning community to first pre-train a deep neural network from a large-scale dataset and then fine-tune the pre-trained model to a specific downstream task. Recently, both supervised and unsupervised pre-training approaches to learning representations have achieved remarkable advances, which exploit the discriminative knowledge of labels and the intrinsic structure of data, respectively. It follows natural intuition that both discriminative knowledge and intrinsic structure of the downstream task can be useful for fine-tuning, however, existing fine-tuning methods mainly leverage the former and discard the latter. A question arises: How to fully explore the intrinsic structure of data for boosting fine-tuning? In this paper, we propose Bi-tuning, a general learning framework to fine-tuning both supervised and unsupervised pre-trained representations to downstream tasks. Bi-tuning generalizes the vanilla fine-tuning by integrating two heads upon the backbone of pre-trained representations: a classifier head with an improved contrastive cross-entropy loss to better leverage the label information in an instance-contrast way, and a projector head with a newly-designed categorical contrastive learning loss to fully exploit the intrinsic structure of data in a category-consistent way. Comprehensive experiments confirm that Bi-tuning achieves state-of-the-art results for fine-tuning tasks of both supervised and unsupervised pre-trained models by large margins (e.g. 10.7% absolute rise in accuracy on CUB in low-data regime).


KNN-BERT: Fine-Tuning Pre-Trained Models with KNN Classifier

Pre-trained models are widely used in fine-tuning downstream tasks with ...

Supervised Contrastive Learning with Nearest Neighbor Search for Speech Emotion Recognition

Speech Emotion Recognition (SER) is a challenging task due to limited da...

Whodunit? Learning to Contrast for Authorship Attribution

Authorship attribution is the task of identifying the author of a given ...

Backbone Can Not be Trained at Once: Rolling Back to Pre-trained Network for Person Re-Identification

In person re-identification (ReID) task, because of its shortage of trai...

Intrinsic Dimensionality Explains the Effectiveness of Language Model Fine-Tuning

Although pretrained language models can be fine-tuned to produce state-o...

RADAM: Texture Recognition through Randomized Aggregated Encoding of Deep Activation Maps

Texture analysis is a classical yet challenging task in computer vision ...

Mining on Manifolds: Metric Learning without Labels

In this work we present a novel unsupervised framework for hard training...

Please sign up or login with your details

Forgot password? Click here to reset