Open-Vocabulary Semantic Segmentation via Attribute Decomposition-Aggregation

by   Chaofan Ma, et al.

Open-vocabulary semantic segmentation is a challenging task that requires segmenting novel object categories at inference time. Recent works explore vision-language pre-training to handle this task, but suffer from unrealistic assumptions in practical scenarios, i.e., low-quality textual category names. For example, this paradigm assumes that new textual categories will be accurately and completely provided, and exist in lexicons during pre-training. However, exceptions often happen when meet with ambiguity for brief or incomplete names, new words that are not present in the pre-trained lexicons, and difficult-to-describe categories for users. To address these issues, this work proposes a novel decomposition-aggregation framework, inspired by human cognition in understanding new concepts. Specifically, in the decomposition stage, we decouple class names into diverse attribute descriptions to enrich semantic contexts. Two attribute construction strategies are designed: using large language models for common categories, and involving manually labelling for human-invented categories. In the aggregation stage, we group diverse attributes into an integrated global description, to form a discriminative classifier that distinguishes the target object from others. One hierarchical aggregation is further designed to achieve multi-level alignment and deep fusion between vision and text. The final result is obtained by computing the embedding similarity between aggregated attributes and images. To evaluate the effectiveness, we annotate three datasets with attribute descriptions, and conduct extensive experiments and ablation studies. The results show the superior performance of attribute decomposition-aggregation.


CAT-Seg: Cost Aggregation for Open-Vocabulary Semantic Segmentation

Existing works on open-vocabulary semantic segmentation have utilized la...

Multi-Modal Prototypes for Open-Set Semantic Segmentation

In semantic segmentation, adapting a visual system to novel object categ...

Language-Grounded Indoor 3D Semantic Segmentation in the Wild

Recent advances in 3D semantic segmentation with deep neural networks ha...

OvarNet: Towards Open-vocabulary Object Attribute Recognition

In this paper, we consider the problem of simultaneously detecting objec...

Open Vocabulary Object Detection with Proposal Mining and Prediction Equalization

Open-vocabulary object detection (OVD) aims to scale up vocabulary size ...

DetCLIP: Dictionary-Enriched Visual-Concept Paralleled Pre-training for Open-world Detection

Open-world object detection, as a more general and challenging goal, aim...

Find What You Want: Learning Demand-conditioned Object Attribute Space for Demand-driven Navigation

The task of Visual Object Navigation (VON) involves an agent's ability t...

Please sign up or login with your details

Forgot password? Click here to reset