Self-Supervised Video Representation Learning with Motion-Contrastive Perception

04/10/2022
by   Jinyu Liu, et al.
0

Visual-only self-supervised learning has achieved significant improvement in video representation learning. Existing related methods encourage models to learn video representations by utilizing contrastive learning or designing specific pretext tasks. However, some models are likely to focus on the background, which is unimportant for learning video representations. To alleviate this problem, we propose a new view called long-range residual frame to obtain more motion-specific information. Based on this, we propose the Motion-Contrastive Perception Network (MCPNet), which consists of two branches, namely, Motion Information Perception (MIP) and Contrastive Instance Perception (CIP), to learn generic video representations by focusing on the changing areas in videos. Specifically, the MIP branch aims to learn fine-grained motion features, and the CIP branch performs contrastive learning to learn overall semantics information for each instance. Experiments on two benchmark datasets UCF-101 and HMDB-51 show that our method outperforms current state-of-the-art visual-only self-supervised approaches.

READ FULL TEXT

page 1

page 3

page 6

research
08/19/2021

Self-Supervised Video Representation Learning with Meta-Contrastive Network

Self-supervised learning has been successfully applied to pre-train vide...
research
09/30/2021

Motion-aware Self-supervised Video Representation Learning via Foreground-background Merging

In light of the success of contrastive learning in the image domain, cur...
research
12/07/2021

Suppressing Static Visual Cues via Normalizing Flows for Self-Supervised Video Representation Learning

Despite the great progress in video understanding made by deep convoluti...
research
03/20/2023

Tubelet-Contrastive Self-Supervision for Video-Efficient Generalization

We propose a self-supervised method for learning motion-focused video re...
research
09/12/2020

Removing the Background by Adding the Background: Towards Background Robust Self-supervised Video Representation Learning

Self-supervised learning has shown great potentials in improving the vid...
research
07/14/2022

Benchmarking Omni-Vision Representation through the Lens of Visual Realms

Though impressive performance has been achieved in specific visual realm...
research
07/16/2022

LAVA: Language Audio Vision Alignment for Contrastive Video Pre-Training

Generating representations of video data is of key importance in advanci...

Please sign up or login with your details

Forgot password? Click here to reset