LaCViT: A Label-aware Contrastive Training Framework for Vision Transformers

03/31/2023
by   Zijun Long, et al.
8

Vision Transformers have been incredibly effective when tackling computer vision tasks due to their ability to model long feature dependencies. By using large-scale training data and various self-supervised signals (e.g., masked random patches), vision transformers provide state-of-the-art performance on several benchmarking datasets, such as ImageNet-1k and CIFAR-10. However, these vision transformers pretrained over general large-scale image corpora could only produce an anisotropic representation space, limiting their generalizability and transferability to the target downstream tasks. In this paper, we propose a simple and effective Label-aware Contrastive Training framework LaCViT, which improves the isotropy of the pretrained representation space for vision transformers, thereby enabling more effective transfer learning amongst a wide range of image classification tasks. Through experimentation over five standard image classification datasets, we demonstrate that LaCViT-trained models outperform the original pretrained baselines by around 9 observed when applying LaCViT to our three evaluated vision transformers.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/08/2021

SiT: Self-supervised vIsion Transformer

Self-supervised learning methods are gaining increasing traction in comp...
research
06/08/2023

Improving Visual Prompt Tuning for Self-supervised Vision Transformers

Visual Prompt Tuning (VPT) is an effective tuning method for adapting pr...
research
04/18/2021

Contrastive Out-of-Distribution Detection for Pretrained Transformers

Pretrained transformers achieve remarkable performance when the test dat...
research
06/03/2021

When Vision Transformers Outperform ResNets without Pretraining or Strong Data Augmentations

Vision Transformers (ViTs) and MLPs signal further efforts on replacing ...
research
03/31/2023

Exploring the Limits of Deep Image Clustering using Pretrained Models

We present a general methodology that learns to classify images without ...
research
10/19/2022

A Unified View of Masked Image Modeling

Masked image modeling has demonstrated great potential to eliminate the ...
research
05/24/2023

ViTMatte: Boosting Image Matting with Pretrained Plain Vision Transformers

Recently, plain vision Transformers (ViTs) have shown impressive perform...

Please sign up or login with your details

Forgot password? Click here to reset