Fine-grained Cross-modal Fusion based Refinement for Text-to-Image Synthesis

02/17/2023
by   Haoran Sun, et al.
0

Text-to-image synthesis refers to generating visual-realistic and semantically consistent images from given textual descriptions. Previous approaches generate an initial low-resolution image and then refine it to be high-resolution. Despite the remarkable progress, these methods are limited in fully utilizing the given texts and could generate text-mismatched images, especially when the text description is complex. We propose a novel Fine-grained text-image Fusion based Generative Adversarial Networks, dubbed FF-GAN, which consists of two modules: Fine-grained text-image Fusion Block (FF-Block) and Global Semantic Refinement (GSR). The proposed FF-Block integrates an attention block and several convolution layers to effectively fuse the fine-grained word-context features into the corresponding visual features, in which the text information is fully used to refine the initial image with more details. And the GSR is proposed to improve the global semantic consistency between linguistic and visual features during the refinement process. Extensive experiments on CUB-200 and COCO datasets demonstrate the superiority of FF-GAN over other state-of-the-art approaches in generating images with semantic consistency to the given texts.Code is available at https://github.com/haoranhfut/FF-GAN.

READ FULL TEXT

page 2

page 3

page 8

page 9

page 10

page 13

research
08/13/2020

DF-GAN: Deep Fusion Generative Adversarial Networks for Text-to-Image Synthesis

Synthesizing high-resolution realistic images from text descriptions is ...
research
08/27/2021

DAE-GAN: Dynamic Aspect-aware GAN for Text-to-Image Synthesis

Text-to-image synthesis refers to generating an image from a given text ...
research
04/02/2019

DM-GAN: Dynamic Memory Generative Adversarial Networks for Text-to-Image Synthesis

In this paper, we focus on generating realistic images from text descrip...
research
04/22/2022

Recurrent Affine Transformation for Text-to-image Synthesis

Text-to-image synthesis aims to generate natural images conditioned on t...
research
08/04/2023

Exploring Part-Informed Visual-Language Learning for Person Re-Identification

Recently, visual-language learning has shown great potential in enhancin...
research
10/27/2022

Towards Better Text-Image Consistency in Text-to-Image Generation

Generating consistent and high-quality images from given texts is essent...
research
12/05/2022

Decoding natural image stimuli from fMRI data with a surface-based convolutional network

Due to the low signal-to-noise ratio and limited resolution of functiona...

Please sign up or login with your details

Forgot password? Click here to reset