Learning to Zoom and Unzoom

03/27/2023
by   Chittesh Thavamani, et al.
0

Many perception systems in mobile computing, autonomous navigation, and AR/VR face strict compute constraints that are particularly challenging for high-resolution input images. Previous works propose nonuniform downsamplers that "learn to zoom" on salient image regions, reducing compute while retaining task-relevant image information. However, for tasks with spatial labels (such as 2D/3D object detection and semantic segmentation), such distortions may harm performance. In this work (LZU), we "learn to zoom" in on the input image, compute spatial features, and then "unzoom" to revert any deformations. To enable efficient and differentiable unzooming, we approximate the zooming warp with a piecewise bilinear mapping that is invertible. LZU can be applied to any task with 2D spatial input and any model with 2D spatial features, and we demonstrate this versatility by evaluating on a variety of tasks and datasets: object detection on Argoverse-HD, semantic segmentation on Cityscapes, and monocular 3D object detection on nuScenes. Interestingly, we observe boosts in performance even when high-resolution sensor data is unavailable, implying that LZU can be used to "learn to upsample" as well.

READ FULL TEXT

page 3

page 6

page 12

research
04/05/2022

SALISA: Saliency-based Input Sampling for Efficient Video Object Detection

High-resolution images are widely adopted for high-performance object de...
research
08/20/2019

Towards High-Resolution Salient Object Detection

Deep neural network based methods have made a significant breakthrough i...
research
04/25/2019

Sensor Fusion for Joint 3D Object Detection and Semantic Segmentation

In this paper, we present an extension to LaserNet, an efficient and sta...
research
08/01/2023

MonoNext: A 3D Monocular Object Detection with ConvNext

Autonomous driving perception tasks rely heavily on cameras as the prima...
research
10/01/2018

RGB-D Object Detection and Semantic Segmentation for Autonomous Manipulation in Clutter

Autonomous robotic manipulation in clutter is challenging. A large varie...
research
07/18/2017

The Devil is in the Decoder

Many machine vision applications require predictions for every pixel of ...
research
11/14/2017

Dynamic Zoom-in Network for Fast Object Detection in Large Images

We introduce a generic framework that reduces the computational cost of ...

Please sign up or login with your details

Forgot password? Click here to reset