Token Transformer: Can class token help window-based transformer build better long-range interactions?

11/11/2022
by   Jiawei Mao, et al.
0

Compared with the vanilla transformer, the window-based transformer offers a better trade-off between accuracy and efficiency. Although the window-based transformer has made great progress, its long-range modeling capabilities are limited due to the size of the local window and the window connection scheme. To address this problem, we propose a novel Token Transformer (TT). The core mechanism of TT is the addition of a Class (CLS) token for summarizing window information in each local window. We refer to this type of token interaction as CLS Attention. These CLS tokens will interact spatially with the tokens in each window to enable long-range modeling. In order to preserve the hierarchical design of the window-based transformer, we designed Feature Inheritance Module (FIM) in each phase of TT to deliver the local window information from the previous phase to the CLS token in the next phase. In addition, we have designed a Spatial-Channel Feedforward Network (SCFFN) in TT, which can mix CLS tokens and embedded tokens on the spatial domain and channel domain without additional parameters. Extensive experiments have shown that our TT achieves competitive results with low parameters in image classification and downstream tasks.

READ FULL TEXT

page 1

page 4

page 8

research
11/25/2021

Global Interaction Modelling in Vision Transformer via Super Tokens

With the popularity of Transformer architectures in computer vision, the...
research
03/17/2023

HDformer: A Higher Dimensional Transformer for Diabetes Detection Utilizing Long Range Vascular Signals

Diabetes mellitus is a worldwide concern, and early detection can help t...
research
09/13/2023

Dynamic Spectrum Mixer for Visual Recognition

Recently, MLP-based vision backbones have achieved promising performance...
research
04/02/2021

TFill: Image Completion via a Transformer-Based Architecture

Bridging distant context interactions is important for high quality imag...
research
05/24/2023

Revenge of MLP in Sequential Recommendation

Sequential recommendation models sequences of historical user-item inter...
research
05/07/2023

RFR-WWANet: Weighted Window Attention-Based Recovery Feature Resolution Network for Unsupervised Image Registration

The Swin transformer has recently attracted attention in medical image a...
research
03/11/2023

TransMatting: Tri-token Equipped Transformer Model for Image Matting

Image matting aims to predict alpha values of elaborate uncertainty area...

Please sign up or login with your details

Forgot password? Click here to reset