TCOVIS: Temporally Consistent Online Video Instance Segmentation

09/21/2023
by   Junlong Li, et al.
0

In recent years, significant progress has been made in video instance segmentation (VIS), with many offline and online methods achieving state-of-the-art performance. While offline methods have the advantage of producing temporally consistent predictions, they are not suitable for real-time scenarios. Conversely, online methods are more practical, but maintaining temporal consistency remains a challenging task. In this paper, we propose a novel online method for video instance segmentation, called TCOVIS, which fully exploits the temporal information in a video clip. The core of our method consists of a global instance assignment strategy and a spatio-temporal enhancement module, which improve the temporal consistency of the features from two aspects. Specifically, we perform global optimal matching between the predictions and ground truth across the whole video clip, and supervise the model with the global optimal objective. We also capture the spatial feature and aggregate it with the semantic feature between frames, thus realizing the spatio-temporal enhancement. We evaluate our method on four widely adopted VIS benchmarks, namely YouTube-VIS 2019/2021/2022 and OVIS, and achieve state-of-the-art performance on all benchmarks without bells-and-whistles. For instance, on YouTube-VIS 2021, TCOVIS achieves 49.5 AP and 61.3 AP with ResNet-50 and Swin-L backbones, respectively. Code is available at https://github.com/jun-long-li/TCOVIS.

READ FULL TEXT

page 2

page 4

page 8

page 9

research
04/13/2021

Crossover Learning for Fast Online Video Instance Segmentation

Modeling temporal visual context across frames is critical for video ins...
research
10/30/2022

Two-Level Temporal Relation Model for Online Video Instance Segmentation

In Video Instance Segmentation (VIS), current approaches either focus on...
research
08/22/2022

InstanceFormer: An Online Video Instance Segmentation Framework

Recent transformer-based offline video instance segmentation (VIS) appro...
research
12/08/2021

VISOLO: Grid-Based Space-Time Aggregation for Efficient Online Video Instance Segmentation

For online video instance segmentation (VIS), fully utilizing the inform...
research
04/11/2019

MAIN: Multi-Attention Instance Network for Video Segmentation

Instance-level video segmentation requires a solid integration of spatia...
research
04/03/2023

Video Instance Segmentation in an Open-World

Existing video instance segmentation (VIS) approaches generally follow a...
research
04/18/2022

Temporally Efficient Vision Transformer for Video Instance Segmentation

Recently vision transformer has achieved tremendous success on image-lev...

Please sign up or login with your details

Forgot password? Click here to reset