Progressive Text-to-Image Diffusion with Soft Latent Direction

09/18/2023
by   Yuteng Ye, et al.
0

In spite of the rapidly evolving landscape of text-to-image generation, the synthesis and manipulation of multiple entities while adhering to specific relational constraints pose enduring challenges. This paper introduces an innovative progressive synthesis and editing operation that systematically incorporates entities into the target image, ensuring their adherence to spatial and relational constraints at each sequential step. Our key insight stems from the observation that while a pre-trained text-to-image diffusion model adeptly handles one or two entities, it often falters when dealing with a greater number. To address this limitation, we propose harnessing the capabilities of a Large Language Model (LLM) to decompose intricate and protracted text descriptions into coherent directives adhering to stringent formats. To facilitate the execution of directives involving distinct semantic operations-namely insertion, editing, and erasing-we formulate the Stimulus, Response, and Fusion (SRF) framework. Within this framework, latent regions are gently stimulated in alignment with each operation, followed by the fusion of the responsive latent components to achieve cohesive entity manipulation. Our proposed framework yields notable advancements in object synthesis, particularly when confronted with intricate and lengthy textual inputs. Consequently, it establishes a new benchmark for text-to-image generation tasks, further elevating the field's performance standards.

READ FULL TEXT

page 1

page 6

page 7

page 10

page 11

page 12

page 13

page 14

research
02/08/2023

Zero-shot Generation of Coherent Storybook from Plain Text Story using Diffusion Models

Recent advancements in large scale text-to-image models have opened new ...
research
09/11/2023

PAI-Diffusion: Constructing and Serving a Family of Open Chinese Diffusion Models for Text-to-image Synthesis on the Cloud

Text-to-image synthesis for the Chinese language poses unique challenges...
research
10/10/2022

Bridging CLIP and StyleGAN through Latent Alignment for Image Editing

Text-driven image manipulation is developed since the vision-language mo...
research
04/18/2021

Towards Open-World Text-Guided Face Image Generation and Manipulation

The existing text-guided image synthesis methods can only produce limite...
research
03/29/2023

MDP: A Generalized Framework for Text-Guided Image Editing by Manipulating the Diffusion Path

Image generation using diffusion can be controlled in multiple ways. In ...
research
05/24/2023

MultiFusion: Fusing Pre-Trained Models for Multi-Lingual, Multi-Modal Image Generation

The recent popularity of text-to-image diffusion models (DM) can largely...
research
04/09/2022

ManiTrans: Entity-Level Text-Guided Image Manipulation via Token-wise Semantic Alignment and Generation

Existing text-guided image manipulation methods aim to modify the appear...

Please sign up or login with your details

Forgot password? Click here to reset