Visual Exemplar Driven Task-Prompting for Unified Perception in Autonomous Driving

by   Xiwen Liang, et al.

Multi-task learning has emerged as a powerful paradigm to solve a range of tasks simultaneously with good efficiency in both computation resources and inference time. However, these algorithms are designed for different tasks mostly not within the scope of autonomous driving, thus making it hard to compare multi-task methods in autonomous driving. Aiming to enable the comprehensive evaluation of present multi-task learning methods in autonomous driving, we extensively investigate the performance of popular multi-task methods on the large-scale driving dataset, which covers four common perception tasks, i.e., object detection, semantic segmentation, drivable area segmentation, and lane detection. We provide an in-depth analysis of current multi-task learning methods under different common settings and find out that the existing methods make progress but there is still a large performance gap compared with single-task baselines. To alleviate this dilemma in autonomous driving, we present an effective multi-task framework, VE-Prompt, which introduces visual exemplars via task-specific prompting to guide the model toward learning high-quality task-specific representations. Specifically, we generate visual exemplars based on bounding boxes and color-based markers, which provide accurate visual appearances of target categories and further mitigate the performance gap. Furthermore, we bridge transformer-based encoders and convolutional layers for efficient and accurate unified perception in autonomous driving. Comprehensive experimental results on the diverse self-driving dataset BDD100K show that the VE-Prompt improves the multi-task baseline and further surpasses single-task models.


page 3

page 4


YOLOPv2: Better, Faster, Stronger for Panoptic Driving Perception

Over the last decade, multi-tasking learning approaches have achieved pr...

Joint 2D-3D Multi-Task Learning on Cityscapes-3D: 3D Detection, Segmentation, and Depth Estimation

This report serves as a supplementary document for TaskPrompter, detaili...

Effective Adaptation in Multi-Task Co-Training for Unified Autonomous Driving

Aiming towards a holistic understanding of multiple downstream tasks sim...

SHIFT: A Synthetic Driving Dataset for Continuous Multi-Task Domain Adaptation

Adapting to a continuously evolving environment is a safety-critical cha...

FULLER: Unified Multi-modality Multi-task 3D Perception via Multi-level Gradient Calibration

Multi-modality fusion and multi-task learning are becoming trendy in 3D ...

Surround-view Fisheye BEV-Perception for Valet Parking: Dataset, Baseline and Distortion-insensitive Multi-task Framework

Surround-view fisheye perception under valet parking scenes is fundament...

Deep Multi-Task Learning for Joint Localization, Perception, and Prediction

Over the last few years, we have witnessed tremendous progress on many s...

Please sign up or login with your details

Forgot password? Click here to reset