CAILA: Concept-Aware Intra-Layer Adapters for Compositional Zero-Shot Learning

05/26/2023
by   Zhaoheng Zheng, et al.
0

Compositionality, the ability to combine existing concepts and generalize towards novel compositions, is a key functionality for intelligent entities. Here, we study the problem of Compositional Zero-Shot Learning (CZSL), which aims at recognizing novel attribute-object compositions. Recent approaches build their systems on top of large-scale Vision-Language Pre-trained (VLP) models, e.g. CLIP, and observe significant improvements. However, these methods treat CLIP as a black box and focus on pre- and post-CLIP operations. Here, we propose to dive deep into the architecture and insert adapters, a parameter-efficient technique proven to be effective among large language models, to each CLIP encoder layer. We further equip adapters with concept awareness so that concept-specific features of "object", "attribute" and "composition" can be extracted. We name our method CAILA, Concept-Aware Intra-Layer Adapters. Quantitative evaluations performed on three popular CZSL datasets, MIT-States, C-GQA, and UT-Zappos, reveal that CAILA achieves double-digit relative improvements against the current state-of-the-art on all benchmarks.

READ FULL TEXT
research
11/09/2022

Prompting Large Pre-trained Vision-Language Models For Compositional Concept Learning

This work explores the zero-shot compositional learning ability of large...
research
07/12/2021

Zero-Shot Compositional Concept Learning

In this paper, we study the problem of recognizing compositional attribu...
research
03/27/2023

Learning Attention as Disentangler for Compositional Zero-shot Learning

Compositional zero-shot learning (CZSL) aims at learning visual concepts...
research
06/11/2019

Task-Aware Deep Sampling for Feature Generation

The human ability to imagine the variety of appearances of novel objects...
research
05/15/2019

Task-Driven Modular Networks for Zero-Shot Compositional Learning

One of the hallmarks of human intelligence is the ability to compose lea...
research
12/31/2020

A Closer Look at Few-Shot Crosslingual Transfer: Variance, Benchmarks and Baselines

We present a focused study of few-shot crosslingual transfer, a recently...
research
09/22/2021

COVR: A test-bed for Visually Grounded Compositional Generalization with real images

While interest in models that generalize at test time to new composition...

Please sign up or login with your details

Forgot password? Click here to reset