DomainStudio: Fine-Tuning Diffusion Models for Domain-Driven Image Generation using Limited Data

by   Jingyuan Zhu, et al.

Denoising diffusion probabilistic models (DDPMs) have been proven capable of synthesizing high-quality images with remarkable diversity when trained on large amounts of data. Typical diffusion models and modern large-scale conditional generative models like text-to-image generative models are vulnerable to overfitting when fine-tuned on extremely limited data. Existing works have explored subject-driven generation using a reference set containing a few images. However, few prior works explore DDPM-based domain-driven generation, which aims to learn the common features of target domains while maintaining diversity. This paper proposes a novel DomainStudio approach to adapt DDPMs pre-trained on large-scale source datasets to target domains using limited data. It is designed to keep the diversity of subjects provided by source domains and get high-quality and diverse adapted samples in target domains. We propose to keep the relative distances between adapted samples to achieve considerable generation diversity. In addition, we further enhance the learning of high-frequency details for better generation quality. Our approach is compatible with both unconditional and conditional diffusion models. This work makes the first attempt to realize unconditional few-shot image generation with diffusion models, achieving better quality and greater diversity than current state-of-the-art GAN-based approaches. Moreover, this work also significantly relieves overfitting for conditional generation and realizes high-quality domain-driven generation, further expanding the applicable scenarios of modern large-scale text-to-image models.


page 5

page 10

page 11

page 23

page 25

page 29

page 31

page 33


Few-shot Image Generation with Diffusion Models

Denoising diffusion probabilistic models (DDPMs) have been proven capabl...

DreamArtist: Towards Controllable One-Shot Text-to-Image Generation via Contrastive Prompt-Tuning

Large-scale text-to-image generation models have achieved remarkable pro...

Few-shot 3D Shape Generation

Realistic and diverse 3D shape generation is helpful for a wide variety ...

A Prompt Log Analysis of Text-to-Image Generation Systems

Recent developments in large language models (LLM) and generative AI hav...

DiffRoom: Diffusion-based High-Quality 3D Room Reconstruction and Generation

We present DiffRoom, a novel framework for tackling the problem of high-...

What Does DALL-E 2 Know About Radiology?

Generative models such as DALL-E 2 could represent a promising future to...

Large-Vocabulary 3D Diffusion Model with Transformer

Creating diverse and high-quality 3D assets with an automatic generative...

Please sign up or login with your details

Forgot password? Click here to reset