Zero-Shot Object Detection: Learning to Simultaneously Recognize and Localize Novel Concepts

by   Shafin Rahman, et al.

Current Zero-Shot Learning (ZSL) approaches are restricted to recognition of a single dominant unseen object category in a test image. We hypothesize that this setting is ill-suited for real-world applications where unseen objects appear only as a part of a complex scene, warranting both the `recognition' and `localization' of an unseen category. To address this limitation, we introduce a new `Zero-Shot Detection' (ZSD) problem setting, which aims at simultaneously recognizing and locating object instances belonging to novel categories without any training examples. We also propose a new experimental protocol for ZSD based on the highly challenging ILSVRC dataset, adhering to practical issues, e.g., the rarity of unseen objects. To the best of our knowledge, this is the first end-to-end deep network for ZSD that jointly models the interplay between visual and semantic domain information. To overcome the noise in the automatically derived semantic descriptions, we utilize the concept of meta-classes to design an original loss function that achieves synergy between max-margin class separation and semantic space clustering. Furthermore, we present a baseline approach extended from recognition to detection setting. Our extensive experiments show significant performance boost over the baseline on the imperative yet difficult ZSD problem.


page 2

page 14

page 25

page 26

page 27


Any-Shot Object Detection

Previous work on novel object detection considers zero or few-shot setti...

Instance-based Max-margin for Practical Few-shot Recognition

In order to mimic the human few-shot learning (FSL) ability better and t...

Polarity Loss for Zero-shot Object Detection

Zero-shot object detection is an emerging research topic that aims to re...

DetCLIP: Dictionary-Enriched Visual-Concept Paralleled Pre-training for Open-world Detection

Open-world object detection, as a more general and challenging goal, aim...

Few-shot Learning with Contextual Cueing for Object Recognition in Complex Scenes

Few-shot Learning aims to recognize new concepts from a small number of ...

COBE: Contextualized Object Embeddings from Narrated Instructional Video

Many objects in the real world undergo dramatic variations in visual app...

Incrementally Zero-Shot Detection by an Extreme Value Analyzer

Human beings not only have the ability to recognize novel unseen classes...

Please sign up or login with your details

Forgot password? Click here to reset