SemHint-MD: Learning from Noisy Semantic Labels for Self-Supervised Monocular Depth Estimation

by   Shan Lin, et al.

Without ground truth supervision, self-supervised depth estimation can be trapped in a local minimum due to the gradient-locality issue of the photometric loss. In this paper, we present a framework to enhance depth by leveraging semantic segmentation to guide the network to jump out of the local minimum. Prior works have proposed to share encoders between these two tasks or explicitly align them based on priors like the consistency between edges in the depth and segmentation maps. Yet, these methods usually require ground truth or high-quality pseudo labels, which may not be easily accessible in real-world applications. In contrast, we investigate self-supervised depth estimation along with a segmentation branch that is supervised with noisy labels provided by models pre-trained with limited data. We extend parameter sharing from the encoder to the decoder and study the influence of different numbers of shared decoder parameters on model performance. Also, we propose to use cross-task information to refine current depth and segmentation predictions to generate pseudo-depth and semantic labels for training. The advantages of the proposed method are demonstrated through extensive experiments on the KITTI benchmark and a downstream task for endoscopic tissue deformation tracking.


page 1

page 3

page 6

page 11


The Edge of Depth: Explicit Constraints between Segmentation and Depth

In this work we study the mutual benefits of two common computer vision ...

Fully Self-Supervised Depth Estimation from Defocus Clue

Depth-from-defocus (DFD), modeling the relationship between depth and de...

MonoDVPS: A Self-Supervised Monocular Depth Estimation Approach to Depth-aware Video Panoptic Segmentation

Depth-aware video panoptic segmentation tackles the inverse projection p...

Seeing Through the Grass: Semantic Pointcloud Filter for Support Surface Learning

Mobile ground robots require perceiving and understanding their surround...

GAUSS: Guided Encoder-Decoder Architecture for Hyperspectral Unmixing with Spatial Smoothness

In recent hyperspectral unmixing (HU) literature, the application of dee...

Self-supervised Cloth Reconstruction via Action-conditioned Cloth Tracking

State estimation is one of the greatest challenges for cloth manipulatio...

Points2Polygons: Context-Based Segmentation from Weak Labels Using Adversarial Networks

In applied image segmentation tasks, the ability to provide numerous and...

Please sign up or login with your details

Forgot password? Click here to reset