SUDS: Scalable Urban Dynamic Scenes

03/25/2023
by   Haithem Turki, et al.
0

We extend neural radiance fields (NeRFs) to dynamic large-scale urban scenes. Prior work tends to reconstruct single video clips of short durations (up to 10 seconds). Two reasons are that such methods (a) tend to scale linearly with the number of moving objects and input videos because a separate model is built for each and (b) tend to require supervision via 3D bounding boxes and panoptic labels, obtained manually or via category-specific models. As a step towards truly open-world reconstructions of dynamic cities, we introduce two key innovations: (a) we factorize the scene into three separate hash table data structures to efficiently encode static, dynamic, and far-field radiance fields, and (b) we make use of unlabeled target signals consisting of RGB images, sparse LiDAR, off-the-shelf self-supervised 2D descriptors, and most importantly, 2D optical flow. Operationalizing such inputs via photometric, geometric, and feature-metric reconstruction losses enables SUDS to decompose dynamic scenes into the static background, individual objects, and their motions. When combined with our multi-branch table representation, such reconstructions can be scaled to tens of thousands of objects across 1.2 million frames from 1700 videos spanning geospatial footprints of hundreds of kilometers, (to our knowledge) the largest dynamic NeRF built to date. We present qualitative initial results on a variety of tasks enabled by our representations, including novel-view synthesis of dynamic urban scenes, unsupervised 3D instance segmentation, and unsupervised 3D cuboid detection. To compare to prior work, we also evaluate on KITTI and Virtual KITTI 2, surpassing state-of-the-art methods that rely on ground truth 3D bounding box annotations while being 10x quicker to train.

READ FULL TEXT

page 1

page 4

page 5

page 7

page 8

page 13

research
07/22/2022

Seeing 3D Objects in a Single Image via Self-Supervised Static-Dynamic Disentanglement

Human perception reliably identifies movable and immovable parts of 3D s...
research
07/31/2023

DiVA-360: The Dynamic Visuo-Audio Dataset for Immersive Neural Fields

Advances in neural fields are enabling high-fidelity capture of the shap...
research
04/14/2023

Unsupervised Learning Optical Flow in Multi-frame Dynamic Environment Using Temporal Dynamic Modeling

For visual estimation of optical flow, a crucial function for many visio...
research
05/09/2022

Panoptic Neural Fields: A Semantic Object-Aware Neural Scene Representation

We present Panoptic Neural Fields (PNF), an object-aware neural scene re...
research
09/09/2021

Reconstructing and grounding narrated instructional videos in 3D

Narrated instructional videos often show and describe manipulations of s...
research
03/15/2019

Live Reconstruction of Large-Scale Dynamic Outdoor Worlds

Standard 3D reconstruction pipelines assume stationary world, therefore ...
research
08/24/2023

NOVA: NOvel View Augmentation for Neural Composition of Dynamic Objects

We propose a novel-view augmentation (NOVA) strategy to train NeRFs for ...

Please sign up or login with your details

Forgot password? Click here to reset