A Comprehensive Survey of Transformers for Computer Vision

11/11/2022
by   Sonain Jamil, et al.
0

As a special type of transformer, Vision Transformers (ViTs) are used to various computer vision applications (CV), such as image recognition. There are several potential problems with convolutional neural networks (CNNs) that can be solved with ViTs. For image coding tasks like compression, super-resolution, segmentation, and denoising, different variants of the ViTs are used. The purpose of this survey is to present the first application of ViTs in CV. The survey is the first of its kind on ViTs for CVs to the best of our knowledge. In the first step, we classify different CV applications where ViTs are applicable. CV applications include image classification, object detection, image segmentation, image compression, image super-resolution, image denoising, and anomaly detection. Our next step is to review the state-of-the-art in each category and list the available models. Following that, we present a detailed analysis and comparison of each model and list its pros and cons. After that, we present our insights and lessons learned for each category. Moreover, we discuss several open research challenges and future research directions.

READ FULL TEXT
research
01/04/2021

Transformers in Vision: A Survey

Astounding results from transformer models on natural language tasks hav...
research
07/07/2022

Vision Transformers: State of the Art and Research Challenges

Transformers have achieved great success in natural language processing....
research
08/17/2021

spectrai: A deep learning framework for spectral data

Deep learning computer vision techniques have achieved many successes in...
research
03/16/2022

A Survey on Infrared Image and Video Sets

In this survey, we compile a list of publicly available infrared image a...
research
04/04/2023

Exploration of Lightweight Single Image Denoising with Transformers and Truly Fair Training

As multimedia content often contains noise from intrinsic defects of dig...
research
02/22/2023

A residual dense vision transformer for medical image super-resolution with segmentation-based perceptual loss fine-tuning

Super-resolution plays an essential role in medical imaging because it p...
research
09/25/2020

Training CNNs in Presence of JPEG Compression: Multimedia Forensics vs Computer Vision

Convolutional Neural Networks (CNNs) have proved very accurate in multip...

Please sign up or login with your details

Forgot password? Click here to reset