Incorporating Learnt Local and Global Embeddings into Monocular Visual SLAM

by   Huaiyang Huang, et al.

Traditional approaches for Visual Simultaneous Localization and Mapping (VSLAM) rely on low-level vision information for state estimation, such as handcrafted local features or the image gradient. While significant progress has been made through this track, under more challenging configuration for monocular VSLAM, e.g., varying illumination, the performance of state-of-the-art systems generally degrades. As a consequence, robustness and accuracy for monocular VSLAM are still widely concerned. This paper presents a monocular VSLAM system that fully exploits learnt features for better state estimation. The proposed system leverages both learnt local features and global embeddings at different modules of the system: direct camera pose estimation, inter-frame feature association, and loop closure detection. With a probabilistic explanation of keypoint prediction, we formulate the camera pose tracking in a direct manner and parameterize local features with uncertainty taken into account. To alleviate the quantization effect, we adapt the mapping module to generate 3D landmarks better to guarantee the system's robustness. Detecting temporal loop closure via deep global embeddings further improves the robustness and accuracy of the proposed system. The proposed system is extensively evaluated on public datasets (Tsukuba, EuRoC, and KITTI), and compared against the state-of-the-art methods. The competitive performance of camera pose estimation confirms the effectiveness of our method.


page 7

page 9

page 11


Position Estimation of Camera Based on Unsupervised Learning

It is an exciting task to recover the scene's 3d-structure and camera po...

Fast Direct Stereo Visual SLAM

We propose a novel approach for fast and accurate stereo visual Simultan...

Pose Graph Optimization for Unsupervised Monocular Visual Odometry

Unsupervised Learning based monocular visual odometry (VO) has lately dr...

DXSLAM: A Robust and Efficient Visual SLAM System with Deep Features

A robust and efficient Simultaneous Localization and Mapping (SLAM) syst...

Low-latency Visual SLAM with Appearance-Enhanced Local Map Building

A local map module is often implemented in modern VO/VSLAM systems to im...

TransCamP: Graph Transformer for 6-DoF Camera Pose Estimation

Camera pose estimation or camera relocalization is the centerpiece in nu...

Metric Monocular Localization Using Signed Distance Fields

Metric localization plays a critical role in vision-based navigation. Fo...

Please sign up or login with your details

Forgot password? Click here to reset