STEm-Seg: Spatio-temporal Embeddings for Instance Segmentation in Videos

03/18/2020
by   Ali Athar, et al.
6

Existing methods for instance segmentation in videos typically involve multi-stage pipelines that follow the tracking-by-detection paradigm and model a video clip as a sequence of images. Multiple networks are used to detect objects in individual frames, and then associate these detections over time. Hence, these methods are often non-end-to-end trainable and highly tailored to specific tasks. In this paper, we propose a different approach that is well-suited to a variety of tasks involving instance segmentation in videos. In particular, we model a video clip as a single 3D spatio-temporal volume, and propose a novel approach that segments and tracks instances across space and time in a single stage. Our problem formulation is centered around the idea of spatio-temporal embeddings which are trained to cluster pixels belonging to a specific object instance over an entire video clip. To this end, we introduce (i) novel mixing functions that enhance the feature representation of spatio-temporal embeddings, and (ii) a single-stage, proposal-free network that can reason about temporal context. Our network is trained end-to-end to learn spatio-temporal embeddings as well as parameters required to cluster these embeddings, thus simplifying inference. Our method achieves state-of-the-art results across multiple datasets and tasks.

READ FULL TEXT

page 2

page 26

page 27

page 28

research
04/22/2022

Tag-Based Attention Guided Bottom-Up Approach for Video Instance Segmentation

Video Instance Segmentation is a fundamental computer vision task that d...
research
07/03/2018

Deep Spatio-Temporal Random Fields for Efficient Video Segmentation

In this work we introduce a time- and memory-efficient method for struct...
research
12/19/2019

Learning a Spatio-Temporal Embedding for Video Instance Segmentation

We present a novel embedding approach for video instance segmentation. O...
research
06/15/2023

Single-Stage Visual Query Localization in Egocentric Videos

Visual Query Localization on long-form egocentric videos requires spatio...
research
09/01/2016

Segmentation Free Object Discovery in Video

In this paper we present a simple yet effective approach to extend witho...
research
09/29/2022

4D-StOP: Panoptic Segmentation of 4D LiDAR using Spatio-temporal Object Proposal Generation and Aggregation

In this work, we present a new paradigm, called 4D-StOP, to tackle the t...
research
03/16/2023

PSVT: End-to-End Multi-person 3D Pose and Shape Estimation with Progressive Video Transformers

Existing methods of multi-person video 3D human Pose and Shape Estimatio...

Please sign up or login with your details

Forgot password? Click here to reset