CLIPood: Generalizing CLIP to Out-of-Distributions

02/02/2023
by   Yang Shu, et al.
16

Out-of-distribution (OOD) generalization, where the model needs to handle distribution shifts from training, is a major challenge of machine learning. Recently, contrastive language-image pre-training (CLIP) models have shown impressive zero-shot ability, revealing a promising path toward OOD generalization. However, to boost upon zero-shot performance, further adaptation of CLIP on downstream tasks is indispensable but undesirably degrades OOD generalization ability. In this paper, we aim at generalizing CLIP to out-of-distribution test data on downstream tasks. Beyond the two canonical OOD situations, domain shift and open class, we tackle a more general but difficult in-the-wild setting where both OOD situations may occur on the unseen test data. We propose CLIPood, a simple fine-tuning method that can adapt CLIP models to all OOD situations. To exploit semantic relations between classes from the text modality, CLIPood introduces a new training objective, margin metric softmax (MMS), with class adaptive margins for fine-tuning. Moreover, to incorporate both the pre-trained zero-shot model and the fine-tuned task-adaptive model, CLIPood proposes a new Beta moving average (BMA) to maintain a temporal ensemble according to Beta distribution. Experiments on diverse datasets with different OOD scenarios show that CLIPood consistently outperforms existing generalization techniques.

READ FULL TEXT

page 3

page 8

research
11/29/2022

Context-Aware Robust Fine-Tuning

Contrastive Language-Image Pre-trained (CLIP) models have zero-shot abil...
research
02/13/2023

A Simple Zero-shot Prompt Weighting Technique to Improve Prompt Ensembling in Text-Image Models

Contrastively trained text-image models have the remarkable ability to p...
research
05/18/2023

A Generalist Dynamics Model for Control

We investigate the use of transformer sequence models as dynamics models...
research
08/11/2023

Diverse Data Augmentation with Diffusions for Effective Test-time Prompt Tuning

Benefiting from prompt tuning, recent years have witnessed the promising...
research
11/06/2022

Momentum-based Weight Interpolation of Strong Zero-Shot Models for Continual Learning

Large pre-trained, zero-shot capable models have shown considerable succ...
research
09/15/2023

Audio-free Prompt Tuning for Language-Audio Models

Contrastive Language-Audio Pretraining (CLAP) is pre-trained to associat...
research
05/30/2022

Prompt-aligned Gradient for Prompt Tuning

Thanks to the large pre-trained vision-language models (VLMs) like CLIP,...

Please sign up or login with your details

Forgot password? Click here to reset