Self-Supervised Learning of Domain Invariant Features for Depth Estimation

by   Hiroyasu Akada, et al.

We tackle the problem of unsupervised synthetic-to-realistic domain adaptation for single image depth estimation. An essential building block of single image depth estimation is an encoder-decoder task network that takes RGB images as input and produces depth maps as output. In this paper, we propose a novel training strategy to force the task network to learn domain invariant representations in a self-supervised manner. Specifically, we extend self-supervised learning from traditional representation learning, which works on images from a single domain, to domain invariant representation learning, which works on images from two different domains by utilizing an image-to-image translation network. Firstly, we use our bidirectional image-to-image translation network to transfer domain-specific styles between synthetic and real domains. This style transfer operation allows us to obtain similar images from the different domains. Secondly, we jointly train our task network and Siamese network with the same images from the different domains to obtain domain invariance for the task network. Finally, we fine-tune the task network using labeled synthetic and unlabeled real-world data. Our training strategy yields improved generalization capability in the real-world domain. We carry out an extensive evaluation on two popular datasets for depth estimation, KITTI and Make3D. The results demonstrate that our proposed method outperforms the state-of-the-art both qualitatively and quantitatively. The source code and model weights will be made available.


page 3

page 6

page 16


S2R-DepthNet: Learning a Generalizable Depth-specific Structural Representation

Human can infer the 3D geometry of a scene from a sketch instead of a re...

Domain Invariant Masked Autoencoders for Self-supervised Learning from Multi-domains

Generalizing learned representations across significantly different visu...

T2Net: Synthetic-to-Realistic Translation for Solving Single-Image Depth Estimation Tasks

Current methods for single-image depth estimation use training datasets ...

Attribute Guided Unpaired Image-to-Image Translation with Semi-supervised Learning

Unpaired Image-to-Image Translation (UIT) focuses on translating images ...

SharinGAN: Combining Synthetic and Real Data for Unsupervised Geometry Estimation

We propose a novel method for combining synthetic and real images when t...

4K-HAZE: A Dehazing Benchmark with 4K Resolution Hazy and Haze-Free Images

Currently, mobile and IoT devices are in dire need of a series of method...

Shadow Transfer: Single Image Relighting For Urban Road Scenes

Illumination effects in images, specifically cast shadows and shading, h...

Please sign up or login with your details

Forgot password? Click here to reset