Prompting through Prototype: A Prototype-based Prompt Learning on Pretrained Vision-Language Models

by   Yue Zhang, et al.

Prompt learning is a new learning paradigm which reformulates downstream tasks as similar pretraining tasks on pretrained models by leveraging textual prompts. Recent works have demonstrated that prompt learning is particularly useful for few-shot learning, where there is limited training data. Depending on the granularity of prompts, those methods can be roughly divided into task-level prompting and instance-level prompting. Task-level prompting methods learn one universal prompt for all input samples, which is efficient but ineffective to capture subtle differences among different classes. Instance-level prompting methods learn a specific prompt for each input, though effective but inefficient. In this work, we develop a novel prototype-based prompt learning method to overcome the above limitations. In particular, we focus on few-shot image recognition tasks on pretrained vision-language models (PVLMs) and develop a method of prompting through prototype (PTP), where we define K image prototypes and K prompt prototypes. In PTP, the image prototype represents a centroid of a certain image cluster in the latent space and a prompt prototype is defined as a soft prompt in the continuous space. The similarity between a query image and an image prototype determines how much this prediction relies on the corresponding prompt prototype. Hence, in PTP, similar images will utilize similar prompting ways. Through extensive experiments on seven real-world benchmarks, we show that PTP is an effective method to leverage the latent knowledge and adaptive to various PVLMs. Moreover, through detailed analysis, we discuss pros and cons for prompt learning and parameter-efficient fine-tuning under the context of few-shot learning.


page 5

page 15


Unsupervised Prototype Adapter for Vision-Language Models

Recently, large-scale pre-trained vision-language models (e.g. CLIP and ...

MerA: Merging Pretrained Adapters For Few-Shot Learning

Adapter tuning, which updates only a few parameters, has become a mainst...

Multimodal Few-Shot Object Detection with Meta-Learning Based Cross-Modal Prompting

We study multimodal few-shot object detection (FSOD) in this paper, usin...

Rectifying the Shortcut Learning of Background: Shared Object Concentration for Few-Shot Image Recognition

Few-Shot image classification aims to utilize pretrained knowledge learn...

Instance-wise Prompt Tuning for Pretrained Language Models

Prompt Learning has recently gained great popularity in bridging the gap...

AnyOKP: One-Shot and Instance-Aware Object Keypoint Extraction with Pretrained ViT

Towards flexible object-centric visual perception, we propose a one-shot...

Instance-aware Prompt Learning for Language Understanding and Generation

Recently, prompt learning has become a new paradigm to utilize pre-train...

Please sign up or login with your details

Forgot password? Click here to reset