Integrating Geometric Control into Text-to-Image Diffusion Models for High-Quality Detection Data Generation via Text Prompt

06/07/2023
by   Kai Chen, et al.
0

Diffusion models have attracted significant attention due to their remarkable ability to create content and generate data for tasks such as image classification. However, the usage of diffusion models to generate high-quality object detection data remains an underexplored area, where not only the image-level perceptual quality but also geometric conditions such as bounding boxes and camera views are essential. Previous studies have utilized either copy-paste synthesis or layout-to-image (L2I) generation with specifically designed modules to encode semantic layouts. In this paper, we propose GeoDiffusion, a simple framework that can flexibly translate various geometric conditions into text prompts and empower the pre-trained text-to-image (T2I) diffusion models for high-quality detection data generation. Unlike previous L2I methods, our GeoDiffusion is able to encode not only bounding boxes but also extra geometric conditions such as camera views in self-driving scenes. Extensive experiments demonstrate GeoDiffusion outperforms previous L2I methods while maintaining 4x training time faster. To the best of our knowledge, this is the first work to adopt diffusion models for layout-to-image generation with geometric conditions and demonstrate that L2I-generated images can be beneficial for improving the performance of object detectors.

READ FULL TEXT

page 1

page 3

page 4

page 7

page 8

page 12

page 13

page 14

research
02/16/2023

MultiDiffusion: Fusing Diffusion Paths for Controlled Image Generation

Recent advances in text-to-image generation with diffusion models presen...
research
05/29/2023

Generating Driving Scenes with Diffusion

In this paper we describe a learned method of traffic scene generation d...
research
03/25/2023

Freestyle Layout-to-Image Synthesis

Typical layout-to-image synthesis (LIS) models generate images for a clo...
research
03/14/2023

LayoutDM: Discrete Diffusion Model for Controllable Layout Generation

Controllable layout generation aims at synthesizing plausible arrangemen...
research
05/19/2023

Text2NeRF: Text-Driven 3D Scene Generation with Neural Radiance Fields

Text-driven 3D scene generation is widely applicable to video gaming, fi...
research
07/25/2023

Composite Diffusion | whole >= Σparts

For an artist or a graphic designer, the spatial layout of a scene is a ...
research
05/28/2021

NViSII: A Scriptable Tool for Photorealistic Image Generation

We present a Python-based renderer built on NVIDIA's OptiX ray tracing e...

Please sign up or login with your details

Forgot password? Click here to reset