A Video Is Worth Three Views: Trigeminal Transformers for Video-based Person Re-identification

04/05/2021
by   Xuehu Liu, et al.
0

Video-based person re-identification (Re-ID) aims to retrieve video sequences of the same person under non-overlapping cameras. Previous methods usually focus on limited views, such as spatial, temporal or spatial-temporal view, which lack of the observations in different feature domains. To capture richer perceptions and extract more comprehensive video representations, in this paper we propose a novel framework named Trigeminal Transformers (TMT) for video-based person Re-ID. More specifically, we design a trigeminal feature extractor to jointly transform raw video data into spatial, temporal and spatial-temporal domain. Besides, inspired by the great success of vision transformer, we introduce the transformer structure for video-based person Re-ID. In our work, three self-view transformers are proposed to exploit the relationships between local features for information enhancement in spatial, temporal and spatial-temporal domains. Moreover, a cross-view transformer is proposed to aggregate the multi-view features for comprehensive video representations. The experimental results indicate that our approach can achieve better performance than other state-of-the-art approaches on public Re-ID benchmarks. We will release the code for model reproduction.

READ FULL TEXT
03/07/2021

Watching You: Global-guided Reciprocal Learning for Video-based Person Re-identification

Video-based person re-identification (Re-ID) aims to automatically retri...
09/05/2019

Adaptive Graph Representation Learning for Video Person Re-identification

Recent years have witnessed a great development of deep learning based v...
07/13/2021

HAT: Hierarchical Aggregation Transformers for Person Re-identification

Recently, with the advance of deep Convolutional Neural Networks (CNNs),...
03/16/2021

Dense Interaction Learning for Video-based Person Re-identification

Video-based person re-identification (re-ID) aims at matching the same p...
01/02/2023

Multi-Stage Spatio-Temporal Aggregation Transformer for Video Person Re-identification

In recent years, the Transformer architecture has shown its superiority ...

Please sign up or login with your details

Forgot password? Click here to reset