VideoFlow: A Flow-Based Generative Model for Video

by   Manoj Kumar, et al.

Generative models that can model and predict sequences of future events can, in principle, learn to capture complex real-world phenomena, such as physical interactions. In particular, learning predictive models of videos offers an especially appealing mechanism to enable a rich understanding of the physical world: videos of real-world interactions are plentiful and readily available, and a model that can predict future video frames can not only capture useful representations of the world, but can be useful in its own right, for problems such as model-based robotic control. However, a central challenge in video prediction is that the future is highly uncertain: a sequence of past observations of events can imply many possible futures. Although a number of recent works have studied probabilistic models that can represent uncertain futures, such models are either extremely expensive computationally (as in the case of pixel-level autoregressive models), or do not directly optimize the likelihood of the data. In this work, we propose a model for video prediction based on normalizing flows, which allows for direct optimization of the data likelihood, and produces high-quality stochastic predictions. To our knowledge, our work is the first to propose multi-frame video prediction with normalizing flows. We describe an approach for modeling the latent space dynamics, and demonstrate that flow-based generative models offer a viable and competitive approach to generative modeling of video.


page 5

page 8

page 9

page 10


Latent Video Transformer

The video generation task can be formulated as a prediction of future vi...

Stochastic Variational Video Prediction

Predicting the future in real-world settings, particularly from raw sens...

Video (language) modeling: a baseline for generative models of natural videos

We propose a strong baseline model for unsupervised feature learning usi...

T3VIP: Transformation-based 3D Video Prediction

For autonomous skill acquisition, robots have to learn about the physica...

Learning to Decompose and Disentangle Representations for Video Prediction

Our goal is to predict future video frames given a sequence of input fra...

Neural Foundations of Mental Simulation: Future Prediction of Latent Representations on Dynamic Scenes

Humans and animals have a rich and flexible understanding of the physica...

Learning Temporal Transformations From Time-Lapse Videos

Based on life-long observations of physical, chemical, and biologic phen...

Please sign up or login with your details

Forgot password? Click here to reset