ReVersion: Diffusion-Based Relation Inversion from Images

by   Ziqi Huang, et al.

Diffusion models gain increasing popularity for their generative capabilities. Recently, there have been surging needs to generate customized images by inverting diffusion models from exemplar images. However, existing inversion methods mainly focus on capturing object appearances. How to invert object relations, another important pillar in the visual world, remains unexplored. In this work, we propose ReVersion for the Relation Inversion task, which aims to learn a specific relation (represented as "relation prompt") from exemplar images. Specifically, we learn a relation prompt from a frozen pre-trained text-to-image diffusion model. The learned relation prompt can then be applied to generate relation-specific images with new objects, backgrounds, and styles. Our key insight is the "preposition prior" - real-world relation prompts can be sparsely activated upon a set of basis prepositional words. Specifically, we propose a novel relation-steering contrastive learning scheme to impose two critical properties of the relation prompt: 1) The relation prompt should capture the interaction between objects, enforced by the preposition prior. 2) The relation prompt should be disentangled away from object appearances. We further devise relation-focal importance sampling to emphasize high-level interactions over low-level appearances (e.g., texture, color). To comprehensively evaluate this new task, we contribute ReVersion Benchmark, which provides various exemplar images with diverse relations. Extensive experiments validate the superiority of our approach over existing methods across a wide range of visual relations.


page 6

page 12

page 17

page 19

page 20

page 21

page 22

page 23


DiffusionSeg: Adapting Diffusion Towards Unsupervised Object Discovery

Learning from a large corpus of data, pre-trained models have achieved i...

Improved Diffusion-based Image Colorization via Piggybacked Models

Image colorization has been attracting the research interests of the com...

TF-ICON: Diffusion-Based Training-Free Cross-Domain Image Composition

Text-driven diffusion models have exhibited impressive generative capabi...

Learning to Learn Relation for Important People Detection in Still Images

Humans can easily recognize the importance of people in social event ima...

LAW-Diffusion: Complex Scene Generation by Diffusion with Layouts

Thanks to the rapid development of diffusion models, unprecedented progr...

RAID: A Relation-Augmented Image Descriptor

As humans, we regularly interpret images based on the relations between ...

Improving Visual Relation Detection using Depth Maps

State of the art visual relation detection methods have been relying on ...

Please sign up or login with your details

Forgot password? Click here to reset