Mining Relations among Cross-Frame Affinities for Video Semantic Segmentation

07/21/2022
by   Guolei Sun, et al.
0

The essence of video semantic segmentation (VSS) is how to leverage temporal information for prediction. Previous efforts are mainly devoted to developing new techniques to calculate the cross-frame affinities such as optical flow and attention. Instead, this paper contributes from a different angle by mining relations among cross-frame affinities, upon which better temporal information aggregation could be achieved. We explore relations among affinities in two aspects: single-scale intrinsic correlations and multi-scale relations. Inspired by traditional feature processing, we propose Single-scale Affinity Refinement (SAR) and Multi-scale Affinity Aggregation (MAA). To make it feasible to execute MAA, we propose a Selective Token Masking (STM) strategy to select a subset of consistent reference tokens for different scales when calculating affinities, which also improves the efficiency of our method. At last, the cross-frame affinities strengthened by SAR and MAA are adopted for adaptively aggregating temporal information. Our experiments demonstrate that the proposed method performs favorably against state-of-the-art VSS methods. The code is publicly available at https://github.com/GuoleiSun/VSS-MRCFA

READ FULL TEXT
research
02/17/2021

Temporal Memory Attention for Video Semantic Segmentation

Video semantic segmentation requires to utilize the complex temporal rel...
research
04/07/2022

Coarse-to-Fine Feature Mining for Video Semantic Segmentation

The contextual information plays a core role in semantic segmentation. A...
research
08/07/2022

Exploring Long Short Range Temporal Information for Learned Video Compression

Learned video compression methods have gained a variety of interest in t...
research
01/10/2023

Video Semantic Segmentation with Inter-Frame Feature Fusion and Inner-Frame Feature Refinement

Video semantic segmentation aims to generate accurate semantic maps for ...
research
07/28/2021

Improving Video Instance Segmentation via Temporal Pyramid Routing

Video Instance Segmentation (VIS) is a new and inherently multi-task pro...
research
01/25/2022

Capturing Temporal Information in a Single Frame: Channel Sampling Strategies for Action Recognition

We address the problem of capturing temporal information for video class...
research
07/12/2023

Rectifying Noisy Labels with Sequential Prior: Multi-Scale Temporal Feature Affinity Learning for Robust Video Segmentation

Noisy label problems are inevitably in existence within medical image se...

Please sign up or login with your details

Forgot password? Click here to reset