Knowledge-Aware Prompt Tuning for Generalizable Vision-Language Models

08/22/2023
by   Baoshuo Kan, et al.
0

Pre-trained vision-language models, e.g., CLIP, working with manually designed prompts have demonstrated great capacity of transfer learning. Recently, learnable prompts achieve state-of-the-art performance, which however are prone to overfit to seen classes, failing to generalize to unseen classes. In this paper, we propose a Knowledge-Aware Prompt Tuning (KAPT) framework for vision-language models. Our approach takes inspiration from human intelligence in which external knowledge is usually incorporated into recognizing novel categories of objects. Specifically, we design two complementary types of knowledge-aware prompts for the text encoder to leverage the distinctive characteristics of category-related external knowledge. The discrete prompt extracts the key information from descriptions of an object category, and the learned continuous prompt captures overall contexts. We further design an adaptation head for the visual encoder to aggregate salient attentive visual cues, which establishes discriminative and task-aware visual representations. We conduct extensive experiments on 11 widely-used benchmark datasets and the results verify the effectiveness in few-shot image classification, especially in generalizing to unseen categories. Compared with the state-of-the-art CoCoOp method, KAPT exhibits favorable performance and achieves an absolute gain of 3.22

READ FULL TEXT
research
10/19/2022

CPL: Counterfactual Prompt Learning for Vision and Language Models

Prompt tuning is a new few-shot transfer learning technique that only tu...
research
09/06/2023

Distribution-Aware Prompt Tuning for Vision-Language Models

Pre-trained vision-language models (VLMs) have shown impressive performa...
research
09/14/2023

Gradient constrained sharpness-aware prompt learning for vision-language models

This paper targets a novel trade-off problem in generalizable prompt lea...
research
04/20/2022

K-LITE: Learning Transferable Visual Models with External Knowledge

Recent state-of-the-art computer vision systems are trained from natural...
research
12/08/2022

Learning Domain Invariant Prompt for Vision-Language Models

Prompt learning is one of the most effective and trending ways to adapt ...
research
05/17/2023

CLIP-GCD: Simple Language Guided Generalized Category Discovery

Generalized Category Discovery (GCD) requires a model to both classify k...
research
10/13/2022

Visual Classification via Description from Large Language Models

Vision-language models (VLMs) such as CLIP have shown promising performa...

Please sign up or login with your details

Forgot password? Click here to reset