Taming Transformers for High-Resolution Image Synthesis

12/17/2020
by   Patrick Esser, et al.
0

Designed to learn long-range interactions on sequential data, transformers continue to show state-of-the-art results on a wide variety of tasks. In contrast to CNNs, they contain no inductive bias that prioritizes local interactions. This makes them expressive, but also computationally infeasible for long sequences, such as high-resolution images. We demonstrate how combining the effectiveness of the inductive bias of CNNs with the expressivity of transformers enables them to model and thereby synthesize high-resolution images. We show how to (i) use CNNs to learn a context-rich vocabulary of image constituents, and in turn (ii) utilize transformers to efficiently model their composition within high-resolution images. Our approach is readily applied to conditional synthesis tasks, where both non-spatial information, such as object classes, and spatial information, such as segmentations, can control the generated image. In particular, we present the first results on semantically-guided synthesis of megapixel images with transformers. Project page at https://compvis.github.io/taming-transformers/ .

READ FULL TEXT

page 15

page 16

page 18

page 19

page 20

page 22

page 26

page 27

research
08/10/2022

PatchDropout: Economizing Vision Transformers Using Patch Dropout

Vision transformers have demonstrated the potential to outperform CNNs i...
research
10/11/2022

Memory transformers for full context and high-resolution 3D Medical Segmentation

Transformer models achieve state-of-the-art results for image segmentati...
research
03/01/2021

Generative Adversarial Transformers

We introduce the GANsformer, a novel and efficient type of transformer, ...
research
08/11/2022

Deep is a Luxury We Don't Have

Medical images come in high resolutions. A high resolution is vital for ...
research
10/12/2022

FontTransformer: Few-shot High-resolution Chinese Glyph Image Synthesis via Stacked Transformers

Automatic generation of high-quality Chinese fonts from a few online tra...
research
03/25/2021

High-Fidelity Pluralistic Image Completion with Transformers

Image completion has made tremendous progress with convolutional neural ...
research
05/13/2021

High-Resolution Complex Scene Synthesis with Transformers

The use of coarse-grained layouts for controllable synthesis of complex ...

Please sign up or login with your details

Forgot password? Click here to reset