Adding 3D Geometry Control to Diffusion Models

by   Wufei Ma, et al.

Diffusion models have emerged as a powerful method of generative modeling across a range of fields, capable of producing stunning photo-realistic images from natural language descriptions. However, these models lack explicit control over the 3D structure of the objects in the generated images. In this paper, we propose a novel method that incorporates 3D geometry control into diffusion models, making them generate even more realistic and diverse images. To achieve this, our method exploits ControlNet, which extends diffusion models by using visual prompts in addition to text prompts. We generate images of 3D objects taken from a 3D shape repository (e.g., ShapeNet and Objaverse), render them from a variety of poses and viewing directions, compute the edge maps of the rendered images, and use these edge maps as visual prompts to generate realistic images. With explicit 3D geometry control, we can easily change the 3D structures of the objects in the generated images and obtain ground-truth 3D annotations automatically. This allows us to use the generated images to improve a lot of vision tasks, e.g., classification and 3D pose estimation, in both in-distribution (ID) and out-of-distribution (OOD) settings. We demonstrate the effectiveness of our method through extensive experiments on ImageNet-50, ImageNet-R, PASCAL3D+, ObjectNet3D, and OOD-CV datasets. The results show that our method significantly outperforms existing methods across multiple benchmarks (e.g., 4.6 percentage points on ImageNet-50 using ViT and 3.5 percentage points on PASCAL3D+ and ObjectNet3D using NeMo).


page 2

page 5

page 7

page 14

page 15


Zero-Shot In-Distribution Detection in Multi-Object Settings Using Vision-Language Foundation Models

Removing out-of-distribution (OOD) images from noisy images scraped from...

Inversion-by-Inversion: Exemplar-based Sketch-to-Photo Synthesis via Stochastic Differential Equations without Training

Exemplar-based sketch-to-photo synthesis allows users to generate photo-...

Generation of Structurally Realistic Retinal Fundus Images with Diffusion Models

We introduce a new technique for generating retinal fundus images that h...

Zero-Shot Text-Guided Object Generation with Dream Fields

We combine neural rendering with multi-modal image and text representati...

ID-Pose: Sparse-view Camera Pose Estimation by Inverting Diffusion Models

Given sparse views of an object, estimating their camera poses is a long...

Directed Diffusion: Direct Control of Object Placement through Attention Guidance

Text-guided diffusion models such as DALLE-2, IMAGEN, and Stable Diffusi...

Intelligent Painter: Picture Composition With Resampling Diffusion Model

Have you ever thought that you can be an intelligent painter? This means...

Please sign up or login with your details

Forgot password? Click here to reset