Learning to Estimate Hidden Motions with Global Motion Aggregation

by   Shihao Jiang, et al.

Occlusions pose a significant challenge to optical flow algorithms that rely on local evidences. We consider an occluded point to be one that is imaged in the first frame but not in the next, a slight overloading of the standard definition since it also includes points that move out-of-frame. Estimating the motion of these points is extremely difficult, particularly in the two-frame setting. Previous work relies on CNNs to learn occlusions, without much success, or requires multiple frames to reason about occlusions using temporal smoothness. In this paper, we argue that the occlusion problem can be better solved in the two-frame case by modelling image self-similarities. We introduce a global motion aggregation module, a transformer-based approach to find long-range dependencies between pixels in the first image, and perform global aggregation on the corresponding motion features. We demonstrate that the optical flow estimates in the occluded regions can be significantly improved without damaging the performance in non-occluded regions. This approach obtains new state-of-the-art results on the challenging Sintel dataset, improving the average end-point error by 13.6% on Sintel Final and 13.7% on Sintel Clean. At the time of submission, our method ranks first on these benchmarks among all published and unpublished approaches. Code is available at https://github.com/zacjiang/GMA .


page 1

page 2

page 5

page 8

page 12

page 13

page 14


SSTM: Spatiotemporal Recurrent Transformers for Multi-frame Optical Flow Estimation

Inaccurate optical flow estimates in and near occluded regions, and out-...

SplatFlow: Learning Multi-frame Optical Flow via Splatting

Occlusion problem remains a key challenge in Optical Flow Estimation (OF...

Memory Enhanced Global-Local Aggregation for Video Object Detection

How do humans recognize an object in a piece of video? Due to the deteri...

Aggregation of local parametric candidates with exemplar-based occlusion handling for optical flow

Handling all together large displacements, motion details and occlusions...

HandOccNet: Occlusion-Robust 3D Hand Mesh Estimation Network

Hands are often severely occluded by objects, which makes 3D hand mesh e...

A Deep Temporal Fusion Framework for Scene Flow Using a Learnable Motion Model and Occlusions

Motion estimation is one of the core challenges in computer vision. With...

GMA3D: Local-Global Attention Learning to Estimate Occluded Motions of Scene Flow

Scene flow is the collection of each point motion information in the 3D ...

Please sign up or login with your details

Forgot password? Click here to reset