Improving Vision Transformers by Revisiting High-frequency Components

04/03/2022
by   Jiawang Bai, et al.
0

The transformer models have shown promising effectiveness in dealing with various vision tasks. However, compared with training Convolutional Neural Network (CNN) models, training Vision Transformer (ViT) models is more difficult and relies on the large-scale training set. To explain this observation we make a hypothesis that ViT models are less effective in capturing the high-frequency components of images than CNN models, and verify it by a frequency analysis. Inspired by this finding, we first investigate the effects of existing techniques for improving ViT models from a new frequency perspective, and find that the success of some techniques (e.g., RandAugment) can be attributed to the better usage of the high-frequency components. Then, to compensate for this insufficient ability of ViT models, we propose HAT, which directly augments high-frequency components of images via adversarial training. We show that HAT can consistently boost the performance of various ViT models (e.g., +1.2 the advanced model VOLO-D5 to 87.3 superiority can also be maintained on out-of-distribution data and transferred to downstream tasks.

READ FULL TEXT

page 2

page 7

page 14

research
05/28/2019

High Frequency Component Helps Explain the Generalization of Convolutional Neural Networks

We investigate the relationship between the frequency spectrum of image ...
research
04/17/2023

Frequency Regularization: Restricting Information Redundancy of Convolutional Neural Networks

Convolutional neural networks have demonstrated impressive results in ma...
research
05/06/2020

Towards Frequency-Based Explanation for Robust CNN

Current explanation techniques towards a transparent Convolutional Neura...
research
07/18/2023

U-shaped Transformer: Retain High Frequency Context in Time Series Analysis

Time series prediction plays a crucial role in various industrial fields...
research
09/25/2021

Deep Learning-Based Detection of the Acute Respiratory Distress Syndrome: What Are the Models Learning?

The acute respiratory distress syndrome (ARDS) is a severe form of hypox...
research
09/16/2023

RingMo-lite: A Remote Sensing Multi-task Lightweight Network with CNN-Transformer Hybrid Framework

In recent years, remote sensing (RS) vision foundation models such as Ri...
research
07/07/2020

Robust Learning with Frequency Domain Regularization

Convolution neural networks have achieved remarkable performance in many...

Please sign up or login with your details

Forgot password? Click here to reset