Learning Ego 3D Representation as Ray Tracing

06/08/2022
by   Jiachen Lu, et al.
18

A self-driving perception model aims to extract 3D semantic representations from multiple cameras collectively into the bird's-eye-view (BEV) coordinate frame of the ego car in order to ground downstream planner. Existing perception methods often rely on error-prone depth estimation of the whole scene or learning sparse virtual 3D representations without the target geometry structure, both of which remain limited in performance and/or capability. In this paper, we present a novel end-to-end architecture for ego 3D representation learning from an arbitrary number of unconstrained camera views. Inspired by the ray tracing principle, we design a polarized grid of "imaginary eyes" as the learnable ego 3D representation and formulate the learning process with the adaptive attention mechanism in conjunction with the 3D-to-2D projection. Critically, this formulation allows extracting rich 3D representation from 2D images without any depth supervision, and with the built-in geometry structure consistent w.r.t. BEV. Despite its simplicity and versatility, extensive experiments on standard BEV visual tasks (e.g., camera-based 3D object detection and BEV segmentation) show that our model outperforms all state-of-the-art alternatives significantly, with an extra advantage in computational efficiency from multi-task learning.

READ FULL TEXT

page 1

page 15

page 20

page 21

research
08/13/2020

Lift, Splat, Shoot: Encoding Images From Arbitrary Camera Rigs by Implicitly Unprojecting to 3D

The goal of perception for autonomous vehicles is to extract semantic re...
research
06/27/2022

LaRa: Latents and Rays for Multi-Camera Bird's-Eye-View Semantic Segmentation

Recent works in autonomous driving have widely adopted the bird's-eye-vi...
research
06/30/2022

PolarFormer: Multi-camera 3D Object Detection with Polar Transformers

3D object detection in autonomous driving aims to reason "what" and "whe...
research
04/09/2021

SVDistNet: Self-Supervised Near-Field Distance Estimation on Surround View Fisheye Cameras

A 360 perception of scene geometry is essential for automated driving, n...
research
07/09/2023

Parametric Depth Based Feature Representation Learning for Object Detection and Segmentation in Bird's Eye View

Recent vision-only perception models for autonomous driving achieved pro...
research
07/29/2021

Geometry Uncertainty Projection Network for Monocular 3D Object Detection

Geometry Projection is a powerful depth estimation method in monocular 3...
research
07/25/2023

HeightFormer: Explicit Height Modeling without Extra Data for Camera-only 3D Object Detection in Bird's Eye View

Vision-based Bird's Eye View (BEV) representation is an emerging percept...

Please sign up or login with your details

Forgot password? Click here to reset