Text to Image Generation with Semantic-Spatial Aware GAN

by   Wentong Liao, et al.

A text to image generation (T2I) model aims to generate photo-realistic images which are semantically consistent with the text descriptions. Built upon the recent advances in generative adversarial networks (GANs), existing T2I models have made great progress. However, a close inspection of their generated images reveals two major limitations: (1) The condition batch normalization methods are applied on the whole image feature maps equally, ignoring the local semantics; (2) The text encoder is fixed during training, which should be trained with the image generator jointly to learn better text representations for image generation. To address these limitations, we propose a novel framework Semantic-Spatial Aware GAN, which is trained in an end-to-end fashion so that the text encoder can exploit better text information. Concretely, we introduce a novel Semantic-Spatial Aware Convolution Network, which (1) learns semantic-adaptive transformation conditioned on text to effectively fuse text features and image features, and (2) learns a mask map in a weakly-supervised way that depends on the current text-image fusion process in order to guide the transformation spatially. Experiments on the challenging COCO and CUB bird datasets demonstrate the advantage of our method over the recent state-of-the-art approaches, regarding both visual fidelity and alignment with input text description.


page 1

page 6

page 7

page 8

page 10

page 11

page 12

page 13


Semantics Disentangling for Text-to-Image Generation

Synthesizing photo-realistic images from text descriptions is a challeng...

Adma-GAN: Attribute-Driven Memory Augmented GANs for Text-to-Image Generation

As a challenging task, text-to-image generation aims to generate photo-r...

Context-Aware Layout to Image Generation with Enhanced Object Appearance

A layout to image (L2I) generation model aims to generate a complicated ...

DSE-GAN: Dynamic Semantic Evolution Generative Adversarial Network for Text-to-Image Generation

Text-to-image generation aims at generating realistic images which are s...

Cycle Text-To-Image GAN with BERT

We explore novel approaches to the task of image generation from their r...

Recurrent Affine Transformation for Text-to-image Synthesis

Text-to-image synthesis aims to generate natural images conditioned on t...

Towards Better Text-Image Consistency in Text-to-Image Generation

Generating consistent and high-quality images from given texts is essent...

Please sign up or login with your details

Forgot password? Click here to reset