Unified Mask Embedding and Correspondence Learning for Self-Supervised Video Segmentation

03/17/2023
by   Liulei Li, et al.
0

The objective of this paper is self-supervised learning of video object segmentation. We develop a unified framework which simultaneously models cross-frame dense correspondence for locally discriminative feature learning and embeds object-level context for target-mask decoding. As a result, it is able to directly learn to perform mask-guided sequential segmentation from unlabeled videos, in contrast to previous efforts usually relying on an oblique solution - cheaply "copying" labels according to pixel-wise correlations. Concretely, our algorithm alternates between i) clustering video pixels for creating pseudo segmentation labels ex nihilo; and ii) utilizing the pseudo labels to learn mask encoding and decoding for VOS. Unsupervised correspondence learning is further incorporated into this self-taught, mask embedding scheme, so as to ensure the generic nature of the learnt representation and avoid cluster degeneracy. Our algorithm sets state-of-the-arts on two standard benchmarks (i.e., DAVIS17 and YouTube-VOS), narrowing the gap between self- and fully-supervised VOS, in terms of both performance and network architecture design.

READ FULL TEXT

page 10

page 11

page 12

page 14

page 15

page 16

page 17

page 18

research
04/09/2023

Self-Supervised Learning of Object Segmentation from Unlabeled RGB-D Videos

This work proposes a self-supervised learning system for segmenting rigi...
research
04/13/2023

Boosting Video Object Segmentation via Space-time Correspondence Learning

Current top-leading solutions for video object segmentation (VOS) typica...
research
04/22/2022

Self-Supervised Video Object Segmentation via Cutout Prediction and Tagging

We propose a novel self-supervised Video Object Segmentation (VOS) appro...
research
08/25/2023

Joint Modeling of Feature, Correspondence, and a Compressed Memory for Video Object Segmentation

Current prevailing Video Object Segmentation (VOS) methods usually perfo...
research
03/27/2022

Locality-Aware Inter-and Intra-Video Reconstruction for Self-Supervised Correspondence Learning

Our target is to learn visual correspondence from unlabeled videos. We d...
research
09/13/2019

Towards Generalizable Forgery Detection with Locality-aware AutoEncoder

With advancements of deep learning techniques, it is now possible to gen...
research
10/10/2020

Hybrid Sequence to Sequence Model for Video Object Segmentation

One-shot Video Object Segmentation (VOS) is the task of pixel-wise track...

Please sign up or login with your details

Forgot password? Click here to reset