Contrastive Bidirectional Transformer for Temporal Representation Learning

06/13/2019
by   Chen Sun, et al.
4

This paper aims at learning representations for long sequences of continuous signals. Recently, the BERT model has demonstrated the effectiveness of stacked transformers for representing sequences of discrete signals (i.e. word tokens). Inspired by its success, we adopt the stacked transformer architecture, but generalize its training objective to maximize the mutual information between the masked signals, and the bidirectional context, via contrastive loss. This enables the model to handle continuous signals, such as visual features. We further consider the case when there are multiple sequences that are semantically aligned at the sequence-level but not at the element-level (e.g. video and ASR), where we propose to use a Transformer to estimate the mutual information between the two sequences, which is again maximized via contrastive loss. We demonstrate the effectiveness of the learned representations on modeling long video sequences for action anticipation and video captioning. The results show that our method, referred to by Contrastive Bidirectional Transformer ( CBT), outperforms various baselines significantly. Furthermore, we improve over the state of the art.

READ FULL TEXT

page 2

page 7

research
08/08/2022

Contrastive Learning with Bidirectional Transformers for Sequential Recommendation

Contrastive learning with Transformer-based sequence encoder has gained ...
research
05/17/2021

TCL: Transformer-based Dynamic Graph Modelling via Contrastive Learning

Dynamic graph modeling has recently attracted much attention due to its ...
research
07/12/2021

CoBERL: Contrastive BERT for Reinforcement Learning

Many reinforcement learning (RL) agents require a large amount of experi...
research
07/12/2021

Contrastive Learning for Cold-Start Recommendation

Recommending cold-start items is a long-standing and fundamental challen...
research
04/03/2019

VideoBERT: A Joint Model for Video and Language Representation Learning

Self-supervised learning has become increasingly important to leverage t...
research
10/30/2020

Cross-Domain Sentiment Classification With Contrastive Learning and Mutual Information Maximization

Contrastive learning (CL) has been successful as a powerful representati...
research
02/01/2021

Semantic Grouping Network for Video Captioning

This paper considers a video caption generating network referred to as S...

Please sign up or login with your details

Forgot password? Click here to reset