Multi-scale Alternated Attention Transformer for Generalized Stereo Matching

by   Wei Miao, et al.

Recent stereo matching networks achieves dramatic performance by introducing epipolar line constraint to limit the matching range of dual-view. However, in complicated real-world scenarios, the feature information based on intra-epipolar line alone is too weak to facilitate stereo matching. In this paper, we present a simple but highly effective network called Alternated Attention U-shaped Transformer (AAUformer) to balance the impact of epipolar line in dual and single view respectively for excellent generalization performance. Compared to other models, our model has several main designs: 1) to better liberate the local semantic features of the single-view at pixel level, we introduce window self-attention to break the limits of intra-row self-attention and completely replace the convolutional network for denser features before cross-matching; 2) the multi-scale alternated attention backbone network was designed to extract invariant features in order to achieves the coarse-to-fine matching process for hard-to-discriminate regions. We performed a series of both comparative studies and ablation studies on several mainstream stereo matching datasets. The results demonstrate that our model achieves state-of-the-art on the Scene Flow dataset, and the fine-tuning performance is competitive on the KITTI 2015 dataset. In addition, for cross generalization experiments on synthetic and real-world datasets, our model outperforms several state-of-the-art works.


page 1

page 4

page 7

page 8

page 12

page 13


Multi-scale Cross-form Pyramid Network for Stereo Matching

Stereo matching plays an indispensable part in autonomous driving, robot...

TransMVSNet: Global Context-aware Multi-view Stereo Network with Transformers

In this paper, we present TransMVSNet, based on our exploration of featu...

Multi-scale Matching Networks for Semantic Correspondence

Deep features have been proven powerful in building accurate dense seman...

What Makes for Hierarchical Vision Transformer?

Recent studies show that hierarchical Vision Transformer with interleave...

Practical Stereo Matching via Cascaded Recurrent Network with Adaptive Correlation

With the advent of convolutional neural networks, stereo matching algori...

Towards Adversarially Robust and Domain Generalizable Stereo Matching by Rethinking DNN Feature Backbones

Stereo matching has recently witnessed remarkable progress using Deep Ne...

ResMatch: Residual Attention Learning for Local Feature Matching

Attention-based graph neural networks have made great progress in featur...

Please sign up or login with your details

Forgot password? Click here to reset