SplatSLAM

End-to-End Indoor Scene Reconstruction from Monocular Video via MASt3R-SLAM and 3DGS.

SplatSLAM

An integrated pipeline for high-fidelity indoor scene reconstruction bridging transformer-based SLAM and 3D Gaussian Splatting.

Computer Vision Course Project | Southern University of Science and Technology

SplatSLAM leverages MASt3R-SLAM and 3DGS to transform smartphone video into interactive 3D digital twins.

Abstract

We present SplatSLAM, an integrated pipeline designed for high-fidelity indoor reconstruction using monocular RGB input. By leveraging the MASt3R-SLAM framework for robust trajectory estimation and dense geometric priors, we bridge the gap between learning-based SLAM and 3D Gaussian Splatting (3DGS). Our approach introduces a standardized geometric refinement process using SOR denoising, effectively eliminating unphysical artifacts. Evaluated on the TUM-RGBD benchmark and self-collected datasets, our system achieves sub-10cm localization accuracy and photo-realistic novel view synthesis.

Methodology

1. Preprocessing
10 FPS Frame Extraction
2. MASt3R-SLAM
Pose & Dense Mapping
3. SOR Denoising
Geometric Refinement
4. 3DGS Training
NVS Optimization

Experiments

SLAM Benchmarking (TUM-RGBD)

Sequence ATE RMSE (m) ↓ ATE Mean (m)
fr1_room 0.0987 0.0909
fr1_360 0.0717 0.0667

Analysis: Sub-10cm accuracy achieved without prior camera calibration.

3DGS Fidelity: Iteration & Denoising Analysis

We found that 7,000 iterations provide the best visual balance. Excessive iterations lead to overfitting artifacts. Additionally, our SOR refinement (removing 7.3% outliers) effectively eliminates "floaters," ensuring cleaner geometric surfaces.

Hardware Constraints (8GB VRAM Stress Test)

Testing on a local RTX 4060 laptop showed a bottleneck with the ViT-Large backbone, resulting in ~2 FPS limit due to VRAM overflow and memory swapping.

Interactive Demo

This project was developed as a final project for the Computer Vision course at Southern University of Science and Technology (SUSTech).
Special thanks to Prof. Feng Zheng and Prof. Weiyu Wang for their professional guidance.