Self-Promoted Supervision for Few-Shot Transformer

by   Bowen Dong, et al.

The few-shot learning ability of vision transformers (ViTs) is rarely investigated though heavily desired. In this work, we empirically find that with the same few-shot learning frameworks, e.g., Meta-Baseline, replacing the widely used CNN feature extractor with a ViT model often severely impairs few-shot classification performance. Moreover, our empirical study shows that in the absence of inductive bias, ViTs often learn the dependencies among input tokens slowly under few-shot learning regime where only a few labeled training data are available, which largely contributes to the above performance degradation. To alleviate this issue, for the first time, we propose a simple yet effective few-shot training framework for ViTs, namely Self-promoted sUpervisioN (SUN). Specifically, besides the conventional global supervision for global semantic learning, SUN further pretrains the ViT on the few-shot learning dataset and then uses it to generate individual location-specific supervision for guiding each patch token. This location-specific supervision tells the ViT which patch tokens are similar or dissimilar and thus accelerates token dependency learning. Moreover, it models the local semantics in each patch token to improve the object grounding and recognition capability which helps learn generalizable patterns. To improve the quality of location-specific supervision, we further propose two techniques: 1) background patch filtration to filtrate background patches out and assign them into an extra background class; and 2) spatial-consistent augmentation to introduce sufficient diversity for data augmentation while keeping the accuracy of the generated local supervisions. Experimental results show that SUN using ViTs significantly surpasses other few-shot learning frameworks with ViTs and is the first one that achieves higher performance than those CNN state-of-the-arts.


Supervised Masked Knowledge Distillation for Few-Shot Transformers

Vision Transformers (ViTs) emerge to achieve impressive performance on m...

Few-shot Sequence Learning with Transformers

Few-shot algorithms aim at learning new tasks provided only a handful of...

Point-McBert: A Multi-choice Self-supervised Framework for Point Cloud Pre-training

Masked language modeling (MLM) has become one of the most successful sel...

Universal Few-shot Learning of Dense Prediction Tasks with Visual Token Matching

Dense prediction tasks are a fundamental class of problems in computer v...

Using dependency parsing for few-shot learning in distributional semantics

In this work, we explore the novel idea of employing dependency parsing ...

Few-Shot Learning with Part Discovery and Augmentation from Unlabeled Images

Few-shot learning is a challenging task since only few instances are giv...

HoloDetect: Few-Shot Learning for Error Detection

We introduce a few-shot learning framework for error detection. We show ...

Please sign up or login with your details

Forgot password? Click here to reset