TarViS: A Unified Approach for Target-based Video Segmentation

01/06/2023
by   Ali Athar, et al.
9

The general domain of video segmentation is currently fragmented into different tasks spanning multiple benchmarks. Despite rapid progress in the state-of-the-art, current methods are overwhelmingly task-specific and cannot conceptually generalize to other tasks. Inspired by recent approaches with multi-task capability, we propose TarViS: a novel, unified network architecture that can be applied to any task that requires segmenting a set of arbitrarily defined 'targets' in video. Our approach is flexible with respect to how tasks define these targets, since it models the latter as abstract 'queries' which are then used to predict pixel-precise target masks. A single TarViS model can be trained jointly on a collection of datasets spanning different tasks, and can hot-swap between tasks during inference without any task-specific retraining. To demonstrate its effectiveness, we apply TarViS to four different tasks, namely Video Instance Segmentation (VIS), Video Panoptic Segmentation (VPS), Video Object Segmentation (VOS) and Point Exemplar-guided Tracking (PET). Our unified, jointly trained model achieves state-of-the-art performance on 5/7 benchmarks spanning these four tasks, and competitive performance on the remaining two.

READ FULL TEXT

page 8

page 15

page 16

page 17

page 18

page 19

page 20

page 21

research
05/30/2022

TubeFormer-DeepLab: Video Mask Transformer

We present TubeFormer-DeepLab, the first attempt to tackle multiple core...
research
06/11/2023

3rd Place Solution for PVUW Challenge 2023: Video Panoptic Segmentation

In order to deal with the task of video panoptic segmentation in the wil...
research
06/19/2020

Video Panoptic Segmentation

Panoptic segmentation has become a new standard of visual recognition ta...
research
10/12/2022

A Generalist Framework for Panoptic Segmentation of Images and Videos

Panoptic segmentation assigns semantic and instance ID labels to every p...
research
12/13/2022

Egocentric Video Task Translation

Different video understanding tasks are typically treated in isolation, ...
research
11/12/2021

Learning Online for Unified Segmentation and Tracking Models

Tracking requires building a discriminative model for the target in the ...
research
06/10/2019

UniDual: A Unified Model for Image and Video Understanding

Although a video is effectively a sequence of images, visual perception ...

Please sign up or login with your details

Forgot password? Click here to reset