Polyp-PVT: Polyp Segmentation with Pyramid Vision Transformers

08/16/2021
by   Bo Dong, et al.
11

Most polyp segmentation methods use CNNs as their backbone, leading to two key issues when exchanging information between the encoder and decoder: 1) taking into account the differences in contribution between different-level features; and 2) designing effective mechanism for fusing these features. Different from existing CNN-based methods, we adopt a transformer encoder, which learns more powerful and robust representations. In addition, considering the image acquisition influence and elusive properties of polyps, we introduce three novel modules, including a cascaded fusion module (CFM), a camouflage identification module (CIM), a and similarity aggregation module (SAM). Among these, the CFM is used to collect the semantic and location information of polyps from high-level features, while the CIM is applied to capture polyp information disguised in low-level features. With the help of the SAM, we extend the pixel features of the polyp area with high-level semantic position information to the entire polyp area, thereby effectively fusing cross-level features. The proposed model, named , effectively suppresses noises in the features and significantly improves their expressive capabilities. Extensive experiments on five widely adopted datasets show that the proposed model is more robust to various challenging situations (e.g., appearance changes, small objects) than existing methods, and achieves the new state-of-the-art performance. The proposed model is available at https://github.com/DengPingFan/Polyp-PVT .

READ FULL TEXT

page 1

page 8

page 9

research
04/04/2022

SPFNet:Subspace Pyramid Fusion Network for Semantic Segmentation

The encoder-decoder structure has significantly improved performance in ...
research
06/03/2023

Efficient Multi-Grained Knowledge Reuse for Class Incremental Segmentation

Class Incremental Semantic Segmentation (CISS) has been a trend recently...
research
07/06/2020

EDSL: An Encoder-Decoder Architecture with Symbol-Level Features for Printed Mathematical Expression Recognition

Printed Mathematical expression recognition (PMER) aims to transcribe a ...
research
07/26/2023

Unite-Divide-Unite: Joint Boosting Trunk and Structure for High-accuracy Dichotomous Image Segmentation

High-accuracy Dichotomous Image Segmentation (DIS) aims to pinpoint cate...
research
01/04/2020

Pixel-Semantic Revise of Position Learning A One-Stage Object Detector with A Shared Encoder-Decoder

We analyze that different methods based channel or position attention me...
research
09/16/2021

Heterogeneous Relational Complement for Vehicle Re-identification

The crucial problem in vehicle re-identification is to find the same veh...
research
07/17/2023

EGE-UNet: an Efficient Group Enhanced UNet for skin lesion segmentation

Transformer and its variants have been widely used for medical image seg...

Please sign up or login with your details

Forgot password? Click here to reset