Generative Adversarial Transformers

by   Drew A. Hudson, et al.

We introduce the GANsformer, a novel and efficient type of transformer, and explore it for the task of visual generative modeling. The network employs a bipartite structure that enables long-range interactions across the image, while maintaining computation of linearly efficiency, that can readily scale to high-resolution synthesis. It iteratively propagates information from a set of latent variables to the evolving visual features and vice versa, to support the refinement of each in light of the other and encourage the emergence of compositional representations of objects and scenes. In contrast to the classic transformer architecture, it utilizes multiplicative integration that allows flexible region-based modulation, and can thus be seen as a generalization of the successful StyleGAN network. We demonstrate the model's strength and robustness through a careful evaluation over a range of datasets, from simulated multi-object environments to rich real-world indoor and outdoor scenes, showing it achieves state-of-the-art results in terms of image quality and diversity, while enjoying fast learning and better data-efficiency. Further qualitative and quantitative experiments offer us an insight into the model's inner workings, revealing improved interpretability and stronger disentanglement, and illustrating the benefits and efficacy of our approach. An implementation of the model is available at


page 4

page 6

page 17

page 18

page 19

page 20

page 21

page 22


Compositional Transformers for Scene Generation

We introduce the GANformer2 model, an iterative object-oriented transfor...

Taming Transformers for High-Resolution Image Synthesis

Designed to learn long-range interactions on sequential data, transforme...

StyleSwin: Transformer-based GAN for High-resolution Image Generation

Despite the tantalizing success in a broad of vision tasks, transformers...

IRISformer: Dense Vision Transformers for Single-Image Inverse Rendering in Indoor Scenes

Indoor scenes exhibit significant appearance variations due to myriad in...

Class-Aware Generative Adversarial Transformers for Medical Image Segmentation

Transformers have made remarkable progress towards modeling long-range d...

Keeping Your Eye on the Ball: Trajectory Attention in Video Transformers

In video transformers, the time dimension is often treated in the same w...

DeblurGAN-v2: Deblurring (Orders-of-Magnitude) Faster and Better

We present a new end-to-end generative adversarial network (GAN) for sin...

Please sign up or login with your details

Forgot password? Click here to reset