DQnet: Cross-Model Detail Querying for Camouflaged Object Detection

by   Wei Sun, et al.

Camouflaged objects are seamlessly blended in with their surroundings, which brings a challenging detection task in computer vision. Optimizing a convolutional neural network (CNN) for camouflaged object detection (COD) tends to activate local discriminative regions while ignoring complete object extent, causing the partial activation issue which inevitably leads to missing or redundant regions of objects. In this paper, we argue that partial activation is caused by the intrinsic characteristics of CNN, where the convolution operations produce local receptive fields and experience difficulty to capture long-range feature dependency among image regions. In order to obtain feature maps that could activate full object extent, keeping the segmental results from being overwhelmed by noisy features, a novel framework termed Cross-Model Detail Querying network (DQnet) is proposed. It reasons the relations between long-range-aware representations and multi-scale local details to make the enhanced representation fully highlight the object regions and eliminate noise on non-object regions. Specifically, a vanilla ViT pretrained with self-supervised learning (SSL) is employed to model long-range dependencies among image regions. A ResNet is employed to enable learning fine-grained spatial local details in multiple scales. Then, to effectively retrieve object-related details, a Relation-Based Querying (RBQ) module is proposed to explore window-based interactions between the global representations and the multi-scale local details. Extensive experiments are conducted on the widely used COD datasets and show that our DQnet outperforms the current state-of-the-arts.


page 1

page 4

page 5

page 7

page 8


TS-CAM: Token Semantic Coupled Attention Map for Weakly Supervised Object Localization

Weakly supervised object localization (WSOL) is a challenging problem wh...

An attention-driven hierarchical multi-scale representation for visual recognition

Convolutional Neural Networks (CNNs) have revolutionized the understandi...

Deep progressive multi-scale attention for acoustic event classification

Convolutional neural network (CNN) is an indispensable building block fo...

LCTR: On Awakening the Local Continuity of Transformer for Weakly Supervised Object Localization

Weakly supervised object localization (WSOL) aims to learn object locali...

Object-aware Long-short-range Spatial Alignment for Few-Shot Fine-Grained Image Classification

The goal of few-shot fine-grained image classification is to recognize r...

ASSD: Attentive Single Shot Multibox Detector

This paper proposes a new deep neural network for object detection. The ...

Associating Multi-Scale Receptive Fields for Fine-grained Recognition

Extracting and fusing part features have become the key of fined-grained...

Please sign up or login with your details

Forgot password? Click here to reset