[Under Review] Taking Language Embedded 3D Gaussian Splatting into the wild

This repository contains the official implementation associated with the paper "Taking Language Embedded 3D Gaussian Splatting into the Wild". We also provide the PT-OVS benchmark and pretrained models for each scene.

🚀 Overview

The codebase consists of three main components:

Optimizer: A PyTorch-based trainer that produces a MALE-GS model from SfM datasets with language feature inputs.
Scene-wise Autoencoder: A module designed to alleviate the substantial memory demands of explicit high-dimensional modeling by compressing features.
PT-OVS Benchmark: A specialized dataset for evaluating Open-Vocabulary Segmentation (OVS) in unconstrained "in-the-wild" environments.

The components have been tested on Ubuntu Linux 22.04. Instructions for setting up and running each of them are found in the sections below.

📊 Datasets

In the experiments section of our paper, we primarily utilized the propose PT-OVS dataset.

The PT-OVS dataset is accessible for download via the following link:

Download Original PhotoTourim Dataset which contains RGB images, corresponding point cloud and camera poses: 7 scenes in total (brandenburg_gate, buckingham_palace, notre_dame_front_facede, pantheon_exterior, taj_mahal, temple_nara_japan, trevi_fountain)
Download our proposed PT-OVS Benchamrk label, put it as the same level of other scenes

🔧 Installation

Cloning the Repository

Since the repository includes submodules, please clone it recursively:

# SSH
git clone https://github.com/yuzewang1998/takinglangsplatw.git --recursive

Environment Setup

Our installation is based on Conda. We mainly follow the LangSplat environment setup.

conda env create --file environment.yml
conda activate malegs

Note: Please also install segment-anything-langsplat and download the SAM checkpoints to ckpts/ from the official repository.

Hardware Requirements

CUDA-ready GPU with Compute Capability 7.0+
24 GB VRAM (to train to paper evaluation quality)

QuickStart

Download the pretrained model, containing constructed WE-GS models, trained autoencoder ckpt, and trained MALE-GS ckpts for a specific scenes, and you can evaluate the method.

python evaluate_iou_loc_pt.py \
        --dataset_name ${CASE_NAME} \
        --feat_dir ${root_path}/output/${exp_name} \
        --ae_ckpt_dir ${root_path}/autoencoder/ckpt \
        --output_dir ${root_path}/eval_result \
        --mask_thresh 0.4 \
        --encoder_dims 256 128 64 32 3 \
        --decoder_dims 16 32 64 128 256 256 512 \
        --json_folder ${gt_folder} \
        --which_feature_fusion_func ${which_post_feature_fusion_func} \
        --sky_filter

Pipeline

Step 1: Train the radiance field.

You can use arbitrary 3DGS-based radiance field reconstruction method, we have test vanilla 3DGS, GS-W, and WE-GS. More advanced in-the-wild radiance field reconstruction method will lead more accurate 3D OVS results. We recommand to use a simplified WE-GS:

cd ~/we-gs/bash_train
  ./train_xxx.sh # attention to add --checkpoint_iteration 20000

The reconstruction model will be in /wegs/output/PT/xxx and move it to the PT dataset folder.

Step 2: Generate Language Feature and uncertainty maps for the Scenes.

Modify the path of '--dataset_path', '--iteration', '--itw_sh_degree' ,'--itw_source_path','--itw_model_path' (Thats all about the config of reconstructed radiance field)
```
./bash_prepprocess.sh
```
Because the large number of images in unconstrained photo collection, this may take times. So we recommand you to use our provided checkpoints for fast test.

Step 3: Train the uncertainty-awared Autoencoder and get the lower-dims Feature.

You can refer to train_bash.sh to input the arguments.

# train the autoencoder
cd autoencoder
python train.py --dataset_path ${scene_dir} --dataset_name ${CASE_NAME} --train_feature_func default --num_epochs 100 --train_with_uncertainly_map --fusion_uncertainly_map_func direct_multiply
# get the compressed language feature of the scene
python test.py --dataset_path ${scene_dir} --dataset_name ${CASE_NAME} --train_feature_func default

Our model expect the following dataset structure in the source path location, similar to MALE-GS:

<dataset_name>
|---images
|   |---<image 0>
|   |---<image 1>
|   |---...
|---language_feature
|   |---00_f.npy
|   |---00_s.npy
|   |---...
|---language_feature_dim3
|   |---00_f.npy
|   |---00_s.npy
|   |---...
|---output
|   |---<dataset_name>
|   |   |---point_cloud/iteration_30000/point_cloud.ply
|   |   |---cameras.json
|   |   |---cfg_args
|   |   |---chkpnt30000.pth
|   |   |---input.ply
|---sparse
    |---0
        |---cameras.bin
        |---images.bin
        |---points3D.bin

Step 3: Train the MALE-GS.

You can refer to train_bash.sh to input the arguments.

python train.py -s ${scene_dir} -m ./output/${exp_name}/${CASE_NAME} --start_checkpoint ${scene_dir}/${reconstruction_case_name}/chkpnt${ckpt_iter}.pth --feature_level 1 --include_feature --resolution 2 --which_feature_fusion_func ${which_feature_fusion_func} --language_features_name language_features_dim3_${CASE_NAME} --iterations 30_000

Step 4: Render the MALE-GS.

python render.py -s ${scene_dir} -m ./output/${exp_name}/${CASE_NAME}_1 --feature_level 1 --include_feature --resolution 2   --language_features_name language_features_dim3_${CASE_NAME} --which_feature_fusion_func ${which_feature_fusion_func} --skip_train --skip_test --render_small_batch

Step 5: Eval. Evaluate the performance on the PT-OVS benchmark. You can refer to train_bash.sh to input the arguments.

python evaluate_iou_loc_pt.py \
        --dataset_name ${CASE_NAME} \
        --feat_dir ${root_path}/output/${exp_name} \
        --ae_ckpt_dir ${root_path}/autoencoder/ckpt \
        --output_dir ${root_path}/eval_result \
        --mask_thresh 0.4 \
        --encoder_dims 256 128 64 32 3 \
        --decoder_dims 16 32 64 128 256 256 512 \
        --json_folder ${gt_folder} \
        --which_feature_fusion_func ${which_post_feature_fusion_func} \
        --sky_filter

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
arguments		arguments
assets		assets
autoencoder		autoencoder
eval		eval
gaussian_renderer		gaussian_renderer
in_the_wild_renderer		in_the_wild_renderer
lpipsPyTorch		lpipsPyTorch
scene		scene
submodules-b		submodules-b
submodules		submodules
utils		utils
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE.md		LICENSE.md
README.md		README.md
bash_preprocess.sh		bash_preprocess.sh
environment.yml		environment.yml
preprocess_mv_aug_twoBranchUncertainly.py		preprocess_mv_aug_twoBranchUncertainly.py
render.py		render.py
train.py		train.py
train_brand.sh		train_brand.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

[Under Review] Taking Language Embedded 3D Gaussian Splatting into the wild

🚀 Overview

📊 Datasets

🔧 Installation

QuickStart

Pipeline

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

[Under Review] Taking Language Embedded 3D Gaussian Splatting into the wild

🚀 Overview

📊 Datasets

🔧 Installation

QuickStart

Pipeline

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages