Laplacian-Former: Overcoming the Limitations of Vision Transformers in Local Texture Detection

08/31/2023
by   Reza Azad, et al.
0

Vision Transformer (ViT) models have demonstrated a breakthrough in a wide range of computer vision tasks. However, compared to the Convolutional Neural Network (CNN) models, it has been observed that the ViT models struggle to capture high-frequency components of images, which can limit their ability to detect local textures and edge information. As abnormalities in human tissue, such as tumors and lesions, may greatly vary in structure, texture, and shape, high-frequency information such as texture is crucial for effective semantic segmentation tasks. To address this limitation in ViT models, we propose a new technique, Laplacian-Former, that enhances the self-attention map by adaptively re-calibrating the frequency information in a Laplacian pyramid. More specifically, our proposed method utilizes a dual attention mechanism via efficient attention and frequency attention while the efficient attention mechanism reduces the complexity of self-attention to linear while producing the same output, selectively intensifying the contribution of shape and texture features. Furthermore, we introduce a novel efficient enhancement multi-scale bridge that effectively transfers spatial information from the encoder to the decoder while preserving the fundamental features. We demonstrate the efficacy of Laplacian-former on multi-organ and skin lesion segmentation tasks with +1.87% and +0.76% dice scores compared to SOTA approaches, respectively. Our implementation is publically available at https://github.com/mindflow-institue/Laplacian-Former

READ FULL TEXT

page 8

page 12

page 13

research
08/25/2023

Unlocking Fine-Grained Details with Wavelet-based High-Frequency Enhancement in Transformers

Medical image segmentation is a critical task that plays a vital role in...
research
09/07/2022

Joint Learning of Deep Texture and High-Frequency Features for Computer-Generated Image Detection

Distinguishing between computer-generated (CG) and natural photographic ...
research
07/11/2022

Wave-ViT: Unifying Wavelet and Transformers for Visual Representation Learning

Multi-scale Vision Transformer (ViT) has emerged as a powerful backbone ...
research
03/09/2022

Anti-Oversmoothing in Deep Vision Transformers via the Fourier Domain Analysis: From Theory to Practice

Vision Transformer (ViT) has recently demonstrated promise in computer v...
research
08/24/2023

EFormer: Enhanced Transformer towards Semantic-Contour Features of Foreground for Portraits Matting

The portrait matting task aims to extract an alpha matte with complete s...
research
06/08/2022

Gaussian Fourier Pyramid for Local Laplacian Filter

Multi-scale processing is essential in image processing and computer gra...
research
05/13/2023

Meta-Polyp: a baseline for efficient Polyp segmentation

In recent years, polyp segmentation has gained significant importance, a...

Please sign up or login with your details

Forgot password? Click here to reset