Long-Term Rhythmic Video Soundtracker

05/02/2023
by   Jiashuo Yu, et al.
7

We consider the problem of generating musical soundtracks in sync with rhythmic visual cues. Most existing works rely on pre-defined music representations, leading to the incompetence of generative flexibility and complexity. Other methods directly generating video-conditioned waveforms suffer from limited scenarios, short lengths, and unstable generation quality. To this end, we present Long-Term Rhythmic Video Soundtracker (LORIS), a novel framework to synthesize long-term conditional waveforms. Specifically, our framework consists of a latent conditional diffusion probabilistic model to perform waveform synthesis. Furthermore, a series of context-aware conditioning encoders are proposed to take temporal information into consideration for a long-term generation. Notably, we extend our model's applicability from dances to multiple sports scenarios such as floor exercise and figure skating. To perform comprehensive evaluations, we establish a benchmark for rhythmic video soundtracks including the pre-processed dataset, improved evaluation metrics, and robust generative baselines. Extensive experiments show that our model generates long-term soundtracks with state-of-the-art musical quality and rhythmic correspondence. Codes are available at <https://github.com/OpenGVLab/LORIS>.

READ FULL TEXT

page 8

page 15

research
10/11/2021

MELONS: generating melody with long-term structure using transformers and structure graph

The creation of long melody sequences requires effective expression of c...
research
07/01/2022

Video + CLIP Baseline for Ego4D Long-term Action Anticipation

In this report, we introduce our adaptation of image-text models for lon...
research
08/23/2023

LongDanceDiff: Long-term Dance Generation with Conditional Diffusion Model

Dancing with music is always an essential human art form to express emot...
research
02/19/2021

Hierarchical Recurrent Neural Networks for Conditional Melody Generation with Long-term Structure

The rise of deep learning technologies has quickly advanced many fields,...
research
11/18/2022

LVOS: A Benchmark for Long-term Video Object Segmentation

Existing video object segmentation (VOS) benchmarks focus on short-term ...
research
12/22/2020

Generating Long-term Continuous Multi-type Generation Profiles using Generative Adversarial Network

Today, the adoption of new technologies has increased power system dynam...
research
07/17/2022

Action-conditioned On-demand Motion Generation

We propose a novel framework, On-Demand MOtion Generation (ODMO), for ge...

Please sign up or login with your details

Forgot password? Click here to reset