MixFormer: End-to-End Tracking with Iterative Mixed Attention

03/21/2022
by   Yutao Cui, et al.
0

Tracking often uses a multi-stage pipeline of feature extraction, target information integration, and bounding box estimation. To simplify this pipeline and unify the process of feature extraction and target information integration, we present a compact tracking framework, termed as {\em MixFormer}, built upon transformers. Our core design is to utilize the flexibility of attention operations, and propose a Mixed Attention Module (MAM) for simultaneous feature extraction and target information integration. This synchronous modeling scheme allows to extract target-specific discriminative features and perform extensive communication between target and search area. Based on MAM, we build our MixFormer tracking framework simply by stacking multiple MAMs with progressive patch embedding and placing a localization head on top. In addition, to handle multiple target templates during online tracking, we devise an asymmetric attention scheme in MAM to reduce computational cost, and propose an effective score prediction module to select high-quality templates. Our MixFormer sets a new state-of-the-art performance on five tracking benchmarks, including LaSOT, TrackingNet, VOT2020, GOT-10k, and UAV123. In particular, our MixFormer-L achieves NP score of 79.9 on LaSOT, 88.9 on TrackingNet and EAO of 0.555 on VOT2020. We also perform in-depth ablation studies to demonstrate the effectiveness of simultaneous feature extraction and information integration. Code and trained models are publicly available at \href{https://github.com/MCG-NJU/MixFormer}{https://github.com/MCG-NJU/MixFormer}.

READ FULL TEXT

page 4

page 8

page 9

research
09/07/2023

Separable Self and Mixed Attention Transformers for Efficient Object Tracking

The deployment of transformers for visual object tracking has shown stat...
research
04/01/2021

Target Transformed Regression for Accurate Tracking

Accurate tracking is still a challenging task due to appearance variatio...
research
11/17/2020

FTK: A Simplicial Spacetime Meshing Framework for Robust and Scalable Feature Tracking

We present the Feature Tracking Kit (FTK), a framework that simplifies, ...
research
07/03/2020

Segment as Points for Efficient Online Multi-Object Tracking and Segmentation

Current multi-object tracking and segmentation (MOTS) methods follow the...
research
02/27/2023

Target-Aware Tracking with Long-term Context Attention

Most deep trackers still follow the guidance of the siamese paradigms an...
research
03/22/2022

Joint Feature Learning and Relation Modeling for Tracking: A One-Stream Framework

The current popular two-stream, two-stage tracking framework extracts th...
research
05/25/2023

LFTK: Handcrafted Features in Computational Linguistics

Past research has identified a rich set of handcrafted linguistic featur...

Please sign up or login with your details

Forgot password? Click here to reset