Video-P2P: Video Editing with Cross-attention Control

03/08/2023
by   Shaoteng Liu, et al.
0

This paper presents Video-P2P, a novel framework for real-world video editing with cross-attention control. While attention control has proven effective for image editing with pre-trained image generation models, there are currently no large-scale video generation models publicly available. Video-P2P addresses this limitation by adapting an image generation diffusion model to complete various video editing tasks. Specifically, we propose to first tune a Text-to-Set (T2S) model to complete an approximate inversion and then optimize a shared unconditional embedding to achieve accurate video inversion with a small memory cost. For attention control, we introduce a novel decoupled-guidance strategy, which uses different guidance strategies for the source and target prompts. The optimized unconditional embedding for the source prompt improves reconstruction ability, while an initialized unconditional embedding for the target prompt enhances editability. Incorporating the attention maps of these two branches enables detailed editing. These technical designs enable various text-driven editing applications, including word swap, prompt refinement, and attention re-weighting. Video-P2P works well on real-world videos for generating new characters while optimally preserving their original poses and scenes. It significantly outperforms previous approaches.

READ FULL TEXT

page 1

page 2

page 3

page 6

page 7

page 8

research
05/08/2023

Prompt Tuning Inversion for Text-Driven Image Editing Using Diffusion Models

Recently large-scale language-image models (e.g., text-guided diffusion ...
research
03/30/2023

Zero-Shot Video Editing Using Off-The-Shelf Image Diffusion Models

Large-scale text-to-image diffusion models achieve unprecedented success...
research
07/02/2023

LEDITS: Real Image Editing with DDPM Inversion and Semantic Guidance

Recent large-scale text-guided diffusion models provide powerful image-g...
research
03/29/2023

MDP: A Generalized Framework for Text-Guided Image Editing by Manipulating the Diffusion Path

Image generation using diffusion can be controlled in multiple ways. In ...
research
06/08/2023

Improving Negative-Prompt Inversion via Proximal Guidance

DDIM inversion has revealed the remarkable potential of real image editi...
research
01/31/2022

Third Time's the Charm? Image and Video Editing with StyleGAN3

StyleGAN is arguably one of the most intriguing and well-studied generat...
research
02/16/2023

T2I-Adapter: Learning Adapters to Dig out More Controllable Ability for Text-to-Image Diffusion Models

The incredible generative ability of large-scale text-to-image (T2I) mod...

Please sign up or login with your details

Forgot password? Click here to reset