Multi-Variate Temporal GAN for Large Scale Video Generation

by   Andres Muñoz, et al.

In this paper, we present a network architecture for video generation that models spatio-temporal consistency without resorting to costly 3D architectures. In particular, we elaborate on the components of noise generation, sequence generation, and frame generation. The architecture facilitates the information exchange between neighboring time points, which improves the temporal consistency of the generated frames both at the structural level and the detailed level. The approach achieves state-of-the-art quantitative performance, as measured by the inception score, on the UCF-101 dataset, which is in line with a qualitative inspection of the generated videos. We also introduce a new quantitative measure that uses downstream tasks for evaluation.


page 1

page 7

page 8

page 11

page 12

page 13


Deep Video Matting via Spatio-Temporal Alignment and Aggregation

Despite the significant progress made by deep learning in natural image ...

HRVGAN: High Resolution Video Generation using Spatio-Temporal GAN

In this paper, we present a novel network for high resolution video gene...

Diverse Video Captioning by Adaptive Spatio-temporal Attention

To generate proper captions for videos, the inference needs to identify ...

DwNet: Dense warp-based network for pose-guided human video generation

Generation of realistic high-resolution videos of human subjects is a ch...

Less is More: Consistent Video Depth Estimation with Masked Frames Modeling

Temporal consistency is the key challenge of video depth estimation. Pre...

Diverse Video Generation from a Single Video

GANs are able to perform generation and manipulation tasks, trained on a...

Deep Video Deblurring: The Devil is in the Details

Video deblurring for hand-held cameras is a challenging task, since the ...

Please sign up or login with your details

Forgot password? Click here to reset