Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles

06/01/2023
by   Chaitanya Ryali, et al.
0

Modern hierarchical vision transformers have added several vision-specific components in the pursuit of supervised classification performance. While these components lead to effective accuracies and attractive FLOP counts, the added complexity actually makes these transformers slower than their vanilla ViT counterparts. In this paper, we argue that this additional bulk is unnecessary. By pretraining with a strong visual pretext task (MAE), we can strip out all the bells-and-whistles from a state-of-the-art multi-stage vision transformer without losing accuracy. In the process, we create Hiera, an extremely simple hierarchical vision transformer that is more accurate than previous models while being significantly faster both at inference and during training. We evaluate Hiera on a variety of tasks for image and video recognition. Our code and models are available at https://github.com/facebookresearch/hiera.

READ FULL TEXT

page 2

page 3

research
04/22/2021

Token Labeling: Training a 85.4 56M Parameters on ImageNet

This paper provides a strong baseline for vision transformers on the Ima...
research
11/09/2022

Masked Vision-Language Transformers for Scene Text Recognition

Scene text recognition (STR) enables computers to recognize and read the...
research
07/04/2023

Pretraining is All You Need: A Multi-Atlas Enhanced Transformer Framework for Autism Spectrum Disorder Classification

Autism spectrum disorder (ASD) is a prevalent psychiatric condition char...
research
04/26/2021

Visformer: The Vision-friendly Transformer

The past year has witnessed the rapid development of applying the Transf...
research
01/12/2023

Tracr: Compiled Transformers as a Laboratory for Interpretability

Interpretability research aims to build tools for understanding machine ...
research
03/08/2022

Joint rotational invariance and adversarial training of a dual-stream Transformer yields state of the art Brain-Score for Area V4

Modern high-scoring models of vision in the brain score competition do n...
research
11/09/2021

Sliced Recursive Transformer

We present a neat yet effective recursive operation on vision transforme...

Please sign up or login with your details

Forgot password? Click here to reset