Boosting Video Object Segmentation via Space-time Correspondence Learning

04/13/2023
by   Yurong Zhang, et al.
0

Current top-leading solutions for video object segmentation (VOS) typically follow a matching-based regime: for each query frame, the segmentation mask is inferred according to its correspondence to previously processed and the first annotated frames. They simply exploit the supervisory signals from the groundtruth masks for learning mask prediction only, without posing any constraint on the space-time correspondence matching, which, however, is the fundamental building block of such regime. To alleviate this crucial yet commonly ignored issue, we devise a correspondence-aware training framework, which boosts matching-based VOS solutions by explicitly encouraging robust correspondence matching during network learning. Through comprehensively exploring the intrinsic coherence in videos on pixel and object levels, our algorithm reinforces the standard, fully supervised training of mask segmentation with label-free, contrastive correspondence learning. Without neither requiring extra annotation cost during training, nor causing speed delay during deployment, nor incurring architectural modification, our algorithm provides solid performance gains on four widely used benchmarks, i.e., DAVIS2016 2017, and YouTube-VOS2018 2019, on the top of famous matching-based VOS solutions.

READ FULL TEXT

page 1

page 3

page 4

page 5

page 8

research
03/17/2023

Unified Mask Embedding and Correspondence Learning for Self-Supervised Video Segmentation

The objective of this paper is self-supervised learning of video object ...
research
10/10/2020

Hybrid Sequence to Sequence Model for Video Object Segmentation

One-shot Video Object Segmentation (VOS) is the task of pixel-wise track...
research
04/01/2019

Video Object Segmentation using Space-Time Memory Networks

We propose a novel solution for semi-supervised video object segmentatio...
research
09/27/2019

DMM-Net: Differentiable Mask-Matching Network for Video Object Segmentation

In this paper, we propose the differentiable mask-matching network (DMM-...
research
06/09/2021

Rethinking Space-Time Networks with Improved Memory Coverage for Efficient Video Object Segmentation

This paper presents a simple yet effective approach to modeling space-ti...
research
06/01/2022

Differentiable Soft-Masked Attention

Transformers have become prevalent in computer vision due to their perfo...
research
03/14/2021

Modular Interactive Video Object Segmentation: Interaction-to-Mask, Propagation and Difference-Aware Fusion

We present Modular interactive VOS (MiVOS) framework which decouples int...

Please sign up or login with your details

Forgot password? Click here to reset