Semantic Exploration and Dense Mapping with Panoramic LiDAR–Camera Fusion

Introduction

This project presents a complete semantic exploration and dense mapping framework that enables a ground robot to autonomously explore, detect, and reconstruct target objects in large unknown environments.

The robot fuses panoramic camera and LiDAR data to build object-level dense semantic maps, integrating viewpoint planning and multi-view fusion to improve mapping completeness and accuracy.

Unlike conventional exploration strategies focused on free-space coverage, our approach explicitly considers semantic targets and object-level observation planning, balancing exploration efficiency and reconstruction quality.

Lobby overall mapping result
Figure 1 – Dense semantic mapping result in lobby environment

System Overview

The framework consists of four main modules — Mapping, Local Sampler, Global Planner, and Safe-Aggressive Exploration Safe Machine — operating in a closed exploration loop.
The robot incrementally constructs a semantic map, samples informative viewpoints, plans exploration paths, and executes them safely while maintaining consistent mapping and localization.

System overview of semantic exploration and mapping
Figure 2 – Overview of the semantic exploration framework with four modules: Mapping, Sampler, Planning, and Safe Machine

Key Components

  • Mapping: Builds a real-time dense voxel map with LiDAR–camera fusion and semantic object reconstruction.
  • Local Sampler: Generates viewpoint candidates around exploration frontiers and object surfaces to maximize information gain.
  • Global Planner: Selects and sequences viewpoints using an ATSP-based global planner and PRM-based local path search.
  • Safe Machine: Monitors collision risks, invalid states, and execution stability to ensure safe and continuous operation.

My Contribution

  • LiDAR–Camera Registration: Extrinsic calibration and time-synchronized fusion between an Ouster LiDAR and panoramic camera (C++ / ROS1).
  • Semantic Mapping: YOLO + SAM2 segmentation to produce labeled point clouds and voxel-based object maps.
  • Coarse-to-Fine Reconstruction: Object model update, merging, and re-centering strategies ensure consistent, complete dense models.
  • Multi-View Integration: Fuses observations from different viewpoints to reduce occlusion and sensor noise.

Video 1 – LiDAR–camera registration and fusion demonstration


Benchmark Results

The exploration planner achieves significantly higher efficiency compared to traditional methods by prioritizing regions with semantic value and minimizing redundant motion.

Benchmark comparison
Figure 3 – Benchmark of semantic exploration planning efficiency
  • Exploration time reduced by >40% compared to baseline planners
  • Viewpoint coverage improved by ~30% for semantic targets
  • Ensured full dense reconstruction in large-scale industrial environments

Real-World Results

The framework was validated in real-world construction and lobby environments.
It successfully reconstructed complex, cluttered scenes with high semantic consistency and sub-centimeter accuracy.

Quantitative evaluation of the reconstructed maps was conducted on two environments. comparing both whole-map and object-level results. Metrics include map completeness, mean distance, and standard deviation of geometric alignment against ground truth.

EnvironmentMap TypeCompletenessMean Dist (m)Std (m)
Construction SiteWhole Map99.65%0.02272480.0651594
Object Map87.03%0.03109010.0413562
LobbyWhole Map99.72%0.00508340.0048516
Object Map97.22%0.01750950.0231272

Table 1 – Statistical evaluation of real-world dense reconstruction results.

Construction mapping result
Figure 4 – Dense semantic reconstruction in a construction site environment
Lobby mapping comparison
Figure 5 – Mapping error comparison in lobby environment

Video 2 – Final semantic exploration and mapping demonstration


Semantic Exploration and Dense Mapping of Complex Environments using Ground Robot with Panoramic LiDAR–Camera Fusion (IEEE RA-L 2025)