ACE-SLAM

Scene Coordinate Regression for Real-Time SLAM

arXiv 2025

Ignacio AlzugarayMarwan TaherAndrew J. Davison

Dyson Robotics Lab | Imperial College London

  Paper
  Video
  Code (Coming soon)
  BibTeX

ACE-SLAM is the first neural implicit SLAM system using a Scene Coordinate Regression network as a scene representation, achieving strict real-time performance on live streams

We present a novel neural RGB-D SLAM system that learns an implicit map of the scene in real time. For the first time, we explore Scene Coordinate Regression (SCR) as the core implicit map representation in a neural SLAM pipeline, training a lightweight network to map 2D image features directly to 3D global coordinates. SCR networks provide efficient, low-memory 3D map representations, enable extremely fast relocalization, and inherently preserve privacy, making them particularly well-suited for neural implicit SLAM.

Our system is the first to achieve strict real-time performance in neural implicit RGB-D SLAM using an SCR-based representation. We introduce a novel SCR architecture tailored for this purpose and describe the key design choices needed to integrate SCR into a live SLAM pipeline. The resulting framework is simple yet flexible, supporting both sparse and dense features, and operates reliably in dynamic environments without special adaptation.

ACE-SLAM Pipeline: Continual Map Learning


Features are extracted from each RGB-D frame using a pretrained and frozen feature extractor. These features are then passed to a Scene Coordinate Regression (SCR) network, which implicitly represents the 3D scene and maps them directly to global 3D coordinates. The predicted global coordinates are aligned with the local geometry derived from the depth map via RANSAC, enabling robust camera relocalization.

The estimated camera pose together with the reconstructed local geometry provides self-supervision signals that continually refine the SCR network, allowing it to learn the scene representation over time. Despite modeling full 3D environments, the resulting map remains extremely compact, occupying less than 1 MB for all shown scenes.

TriMLP: An Adapative Scene Coordinate Regression Architecture


Instead of regressing the 3D coordinate directly as in traditional SCR, our network uses an MLP to predict coordinate weights on three orthogonal 3D planes, and the final 3D point is obtained as a weighted combination of these plane-based activations.
This design provides a powerful inductive bias that greatly increases representational flexibility, allowing many internal configurations to yield the same regressed 3D coordinate. As a result, the map adapts far more quickly and reliably as new parts of the scene are explored.

Experimental Evaluation


The proposed system operates robustly across diverse scenes, handling variations in scale and geometric complexity with ease. It naturally detects loop closures and corrects drift implicitly, without requiring any dedicated loop‐closure module.

The method is highly efficient: all datasets are processed in strict real time, matching the frame rate at which they are captured. Despite modeling full 3D environments, the learned map remains extremely compact, with every scene stored in under 1 MB.

Always-on Relocalization and Robustness to Dynamic Scenes


The proposed system continuously relocalizes and is inherently robust to changes in the scene without requiring any special components or explicit handling. It utilizes all available features in the field of view and automatically discards those that do not conform to the camera motion. This enables the system to operate seamlessly even when parts of the scene are changing, without affecting its performance.

Citation


@inproceedings{alzugaray2025aceslam,
    title={ACE-SLAM: Scene Coordinate Regression for Real-Time SLAM},
    author={Alzugaray, Ignacio and Taher, Marwan and Davison, Andrew J.},
    booktitle={arXiv},
    year={2025},
}