DeepAI AI Chat
Log In Sign Up

Collage Diffusion

03/01/2023
by   Vishnu Sarukkai, et al.
19

Text-conditional diffusion models generate high-quality, diverse images. However, text is often an ambiguous specification for a desired target image, creating the need for additional user-friendly controls for diffusion-based image generation. We focus on having precise control over image output for scenes with several objects. Users control image generation by defining a collage: a text prompt paired with an ordered sequence of layers, where each layer is an RGBA image and a corresponding text prompt. We introduce Collage Diffusion, a collage-conditional diffusion algorithm that allows users to control both the spatial arrangement and visual attributes of objects in the scene, and also enables users to edit individual components of generated images. To ensure that different parts of the input text correspond to the various locations specified in the input collage layers, Collage Diffusion modifies text-image cross-attention with the layers' alpha masks. To maintain characteristics of individual collage layers that are not specified in text, Collage Diffusion learns specialized text representations per layer. Collage input also enables layer-based controls that provide fine-grained control over the final output: users can control image harmonization on a layer-by-layer basis, and they can edit individual objects in generated images while keeping other objects fixed. Collage-conditional image generation requires harmonizing the input collage to make objects fit together–the key challenge involves minimizing changes in the positions and key visual attributes of objects in the input collage while allowing other attributes of the collage to change in the harmonization process. By leveraging the rich information present in layer input, Collage Diffusion generates globally harmonized images that maintain desired object locations and visual characteristics better than prior approaches.

READ FULL TEXT

page 10

page 20

page 21

page 22

page 23

page 24

page 25

page 26

02/16/2023

MultiDiffusion: Fusing Diffusion Paths for Controlled Image Generation

Recent advances in text-to-image generation with diffusion models presen...
02/25/2023

Directed Diffusion: Direct Control of Object Placement through Attention Guidance

Text-guided diffusion models such as DALLE-2, IMAGEN, and Stable Diffusi...
03/20/2023

Localizing Object-level Shape Variations with Text-to-Image Diffusion Models

Text-to-image models give rise to workflows which often begin with an ex...
03/25/2021

AttrLostGAN: Attribute Controlled Image Synthesis from Reconfigurable Layout and Style

Conditional image synthesis from layout has recently attracted much inte...
11/25/2022

SpaText: Spatio-Textual Representation for Controllable Image Generation

Recent text-to-image diffusion models are able to generate convincing re...
12/06/2022

M-VADER: A Model for Diffusion with Multimodal Context

We introduce M-VADER: a diffusion model (DM) for image generation where ...
02/06/2023

Structure and Content-Guided Video Synthesis with Diffusion Models

Text-guided generative diffusion models unlock powerful image creation a...