MASt3R-SLAM: Real-Time Dense SLAM
with 3D Reconstruction Priors

Imperial College London
* Authors contributed equally to this work
[paper] [video] [code]
Code is planned to be released in February.

Abstract

We present a real-time monocular dense SLAM system designed bottom-up from MASt3R, a two-view 3D reconstruction and matching prior. Equipped with this strong prior, our system is robust on in-the-wild video sequences despite making no assumption on a fixed or parametric camera model beyond a unique camera centre. We introduce efficient methods for pointmap matching, camera tracking and local fusion, graph construction and loop closure, and second-order global optimisation. With known calibration, a simple modification to the system achieves state-of-the-art performance across various benchmarks. Altogether, we propose a plug-and-play monocular SLAM system capable of producing globally-consistent poses and dense geometry while operating at 15 FPS.

Real-time large-scale office reconstruction using an uncalibrated RGB camera (playback at 8x speed).

Method

System Diagram

The fundamental building blocks are MASt3R, which outputs pointmaps in a common coordinate frame given two images, and our efficient pointmap matching. This is used in the frontend for camera tracking and pointmap fusion, as well as in the backend for loop closure and large-scale global optimisation.

Generic Camera Model: Pointmap to Rays

For each frame, our system defines a generic central camera model by normalising a pointmap into rays. This enables SLAM with time-varying camera models such as the highly dynamic zooming shown above.

Efficient Pointmap Matching

Matching in 3D or feature space is too slow for real-time SLAM. Given the pointmap from DUSt3R or MASt3R in a common coordinate frame, we perform massively parallel matching by minimising the angular error between the ray from the camera centre to a 3D point and the ray queried by the current pixel.

Large-Scale Backend Optimisation

Backend optimisation ensures global consistency of poses and dense geometry. Since gradient descent converges slowly, we leverage Gauss-Newton optimisation to achieve efficient large-scale updates.

Video

BibTex

      @article{murai2024_mast3rslam,
        title={{MASt3R-SLAM}: Real-Time Dense {SLAM} with {3D} Reconstruction Priors},
        author={Murai, Riku and Dexheimer, Eric and Davison, Andrew J.},
        journal={arXiv preprint},
        year={2024},
    }