Composite Diffusion | whole >= Σparts

by   Vikram Jamwal, et al.

For an artist or a graphic designer, the spatial layout of a scene is a critical design choice. However, existing text-to-image diffusion models provide limited support for incorporating spatial information. This paper introduces Composite Diffusion as a means for artists to generate high-quality images by composing from the sub-scenes. The artists can specify the arrangement of these sub-scenes through a flexible free-form segment layout. They can describe the content of each sub-scene primarily using natural text and additionally by utilizing reference images or control inputs such as line art, scribbles, human pose, canny edges, and more. We provide a comprehensive and modular method for Composite Diffusion that enables alternative ways of generating, composing, and harmonizing sub-scenes. Further, we wish to evaluate the composite image for effectiveness in both image quality and achieving the artist's intent. We argue that existing image quality metrics lack a holistic evaluation of image composites. To address this, we propose novel quality criteria especially relevant to composite generation. We believe that our approach provides an intuitive method of art creation. Through extensive user surveys, quantitative and qualitative analysis, we show how it achieves greater spatial, semantic, and creative control over image generation. In addition, our methods do not need to retrain or modify the architecture of the base diffusion models and can work in a plug-and-play manner with the fine-tuned models.


page 3

page 25

page 26

page 30

page 31

page 38

page 40

page 42


SpaText: Spatio-Textual Representation for Controllable Image Generation

Recent text-to-image diffusion models are able to generate convincing re...

SSMG: Spatial-Semantic Map Guided Diffusion Model for Free-form Layout-to-Image Generation

Despite significant progress in Text-to-Image (T2I) generative models, e...

Design Booster: A Text-Guided Diffusion Model for Image Translation with Spatial Layout Preservation

Diffusion models are able to generate photorealistic images in arbitrary...

LAW-Diffusion: Complex Scene Generation by Diffusion with Layouts

Thanks to the rapid development of diffusion models, unprecedented progr...

Integrating Geometric Control into Text-to-Image Diffusion Models for High-Quality Detection Data Generation via Text Prompt

Diffusion models have attracted significant attention due to their remar...

Instruct 3D-to-3D: Text Instruction Guided 3D-to-3D conversion

We propose a high-quality 3D-to-3D conversion method, Instruct 3D-to-3D....

NeuralField-LDM: Scene Generation with Hierarchical Latent Diffusion Models

Automatically generating high-quality real world 3D scenes is of enormou...

Please sign up or login with your details

Forgot password? Click here to reset