This is the official repository for the paper:
IGGT: Instance-Grounded Geometry Transformer for Semantic 3D Reconstruction
Hao Li*, Zhengyu Zou*, Fangfu Liu*, Xuanyang Zhang, Fangzhou Hong, Yukang Cao, Yushi Lan, Manyuan Zhang, Gang Yu, Dingwen Zhang†, and Ziwei Liu
IGGT introduces a novel transformer-based architecture for semantic 3D reconstruction that grounds instance-level understanding in geometric representations. Our method achieves state-of-the-art performance on multiple benchmarks while maintaining computational efficiency.
Key Features:
- 🎯 Instance-grounded 3D feature learning
- 🏗️ Geometry-aware transformer architecture
- 📊 State-of-the-art performance on ScanNet and InsScene-15K
- ⚡ Efficient inference with multi-view consistency
- Release project paper
- Release Benchmark (Segmentation, Track)
- Release InsScene-15K dataset
- [] Release codebase
- Release model code
- [] Release downstream task scripts
- Release pretrained models
To set up the environment for this project, please follow these steps:
-
Create a new Conda environment with Python 3.10.0:
conda create -n iggt python=3.10.0 conda activate iggt
-
Install the required dependencies:
pip install -r requirements.txt
Note: To accelerate clustering (DBSCAN) significantly, we highly recommend installing
cumlfrom RAPIDS. Please refer to the official installation guide to choose the appropriate version for your system.
We provide demo.py to demonstrate IGGT's capabilities in 3D scene reconstruction and segmentation.
We provide sample scenes in the iggt_demo directory (e.g., iggt_demo/demo1 to iggt_demo/demo9).
For your own data, please organize it with the following structure:
scene_name/
└── images/ # Input images (sorted by filename)
├── 00000.jpg
├── 00001.jpg
└── ...
(Optional) For evaluation against ground truth:
scene_name/
├── depth/ # Ground truth depth maps
└── cam/ # Camera parameters (.npz files)
Configure the paths in demo.py:
MODEL_PATH: Path to the pretrained checkpoint.TARGET_DIR: Path to your input data directory.SAVE_DIR: Path where results will be saved.
You can also adjust the CLUSTERING_CONFIG in demo.py to optimize segmentation results:
eps: DBSCAN epsilon parameter (default: 0.01). Controls the maximum distance between points to be considered neighbors.min_samples: Minimum samples for a core point (default: 100).min_cluster_size: Minimum size for a valid cluster (default: 500).knn_k: Number of neighbors for spatial smoothing (default: 20).
Then run the script:
python demo.pyThe script will generate:
- 3D Visualizations:
.glbfiles for RGB, Mask, and PCA features. - Depth Maps: Visualizations with various colormaps in
pred_depths/. - Segmentation: DBSCAN and PCA masks in
dbscan_masks/andcolored_pca/.
Figure: Example 3D scene segmentation and reconstruction by IGGT.
If you find our code or paper helpful, please consider starring ⭐ us and citing:
@article{li2025iggt,
title={IGGT: Instance-Grounded Geometry Transformer for Semantic 3D Reconstruction},
author={Li, Hao and Zou, Zhengyu and Liu, Fangfu and Zhang, Xuanyang and Hong, Fangzhou and Cao, Yukang and Lan, Yushi and Zhang, Manyuan and Yu, Gang and Zhang, Dingwen and others},
journal={arXiv preprint arXiv:2510.22706},
year={2025}
}This project is released under the MIT License. See LICENSE for details.

