Skip to content

danieldoh/OTA_3DHOI

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Occlusion-Aware Temporally Consistent Amodal Completion for 3D Human-Object Interaction Reconstruction [ACM MM 2025]

Hyungjun Doh1*, Dong In Lee1,2*, Seunggeun Chi1*, Pin-Hao Huang3, Kwonjoon Lee3,
Sangpil Kim2†, Karthik Ramani1†

1Purdue University,   2Korea University,   3Honda Research Institute USA

Temporally Consistent Amodal Completion for Human-Object Interaction

This repository contains the experiment pipeline for producing temporally consistent amodal RGB completions of an interacting person or object, followed by optional 3D Gaussian reconstruction and joint human-object rendering.

The workflow supports BEHAVE and InterCap. It integrates external research repositories instead of redistributing their full source code:

  • HDM for template-free human-object point cloud prediction.
  • BEHAVE dataset tools for BEHAVE sequence, calibration, mask, and pose access.
  • InterCap for InterCap data and annotations.
  • SEA-RAFT for optical flow.
  • GSPose for object 3D Gaussian reconstruction.
  • GaussianAvatar for human Gaussian reconstruction and final rendering.

Repository layout

.
├── preprocessing/
│   ├── README.md
│   └── validate_layout.py
├── hdm/
│   ├── project_masks.py
│   ├── alpha_shape_masks.py
│   ├── run_hdm.py
│   └── run_hdm.sh
├── amodal_completion/
│   ├── configs/
│   ├── scripts/
│   ├── main.py
│   ├── sea_raft_flow.py
│   ├── utils.py
│   └── inpainting.py
└── reconstruction/
    ├── object/
    ├── human/
    ├── rendering/
    └── install_integrations.sh

Pipeline

The complete processing order is:

dataset video and annotations
        |
        v
common per-frame layout
        |
        v
HDM point prediction -> projected dots -> alpha-shape concave masks
        |
        v
mask fusion and 512 x 512 preprocessing
        |
        v
SEA-RAFT temporal optical flow
        |
        v
VAE feature extraction -> flow warping -> temporal attention fusion
        |
        v
Stable Diffusion amodal completion
        |
        +--> GSPose object reconstruction
        +--> GaussianAvatar human reconstruction
        +--> combined human-object rendering

Environments

The complete workflow should not be installed in one Python environment. HDM, SEA-RAFT, GSPose, GaussianAvatar, and dataset loaders have different PyTorch, CUDA, and PyTorch3D requirements.

Recommended environments:

  1. hdm: use the versions documented by HDM.
  2. amodal: use Python 3.10 and install this repository's pinned requirements.txt. If the pinned PyTorch wheel does not match the server's CUDA driver, install the corresponding torch==2.4.1 and torchvision==0.19.1 wheels from the official PyTorch index first.
  3. sea-raft: use the official SEA-RAFT environment.
  4. gspose: use the official GSPose environment.
  5. gaussian-avatar: use the official GaussianAvatar environment.
  6. behave or intercap: use the official dataset repository environment for dataset conversion and pose extraction.

Do not commit dataset files, model checkpoints, generated masks, latent features, optical flow, completed images, or reconstructed models.

1. Prepare videos

Follow preprocessing/README.md to export BEHAVE or InterCap frames. Both datasets must be converted to this common layout:

/path/to/preprocessed/
└── VIDEO_NAME/
    ├── 00000/
    │   ├── 00000.color.jpg
    │   ├── 00000.obj_mask.png
    │   ├── 00000.person_mask.png
    │   └── HDM_mask/
    ├── 00001/
    └── ...

Create a JSON manifest:

{
  "video_folders": [
    {
      "folder": "VIDEO_NAME",
      "prompt": "object description"
    }
  ]
}

Example manifests are provided in amodal_completion/configs/.

BEHAVE uses frames [90, 1497) when available. InterCap is truncated to the largest multiple of 16 frames.

2. Generate HDM masks

Set up the official HDM repository first. The wrapper copies hdm/project_masks.py into that checkout and runs it with the official HDM configuration and model code.

HDM_ROOT=/path/to/HDM \
FRAMES_ROOT=/path/to/preprocessed \
CONFIG_PATH=/path/to/dataset.json \
GPU_ID=0 \
./hdm/run_hdm.sh

For each frame, HDM predicts human and object point clouds. The wrapper projects the two point sets into the input camera and writes:

HDM_mask/00000_obj_pj_mask.png
HDM_mask/00000_human_pj_mask.png

The same command then uses a 2D alpha shape to connect the projected pixels:

HDM_mask/00000_obj_concave_mask.png
HDM_mask/00000_human_concave_mask.png

The default alpha value is 0.07. A denser or noisier projection may require a different --alpha value.

Validate the resulting dataset:

python preprocessing/validate_layout.py \
  --frames-root /path/to/preprocessed \
  --config /path/to/dataset.json

3. Preprocess images and masks

The preprocessing stage:

  • unions the visible segmentation with the HDM concave mask;
  • computes one sequence-level crop;
  • resizes RGB and masks to 512 x 512;
  • creates object-only and person-only RGB inputs;
  • computes the occluded region to inpaint.

For BEHAVE:

cd amodal_completion

FRAMES_ROOT=/path/to/behave/preprocessed \
WORK_ROOT=/path/to/behave/work \
INPAINTED_ROOT=/path/to/behave/completed \
CONFIG_PATH=/path/to/behave.json \
SUBJECT=obj \
PIPELINE_STAGES="--preprocess" \
./scripts/run_behave.sh

For InterCap, use ./scripts/run_intercap.sh with the corresponding paths.

SUBJECT may be obj or person.

4. Calculate optical flow

Set up SEA-RAFT and select its config and checkpoint. Optical flow is computed between each current frame and seven sampled past plus seven sampled future frames. Forward and backward flows are resized to the 64 x 64 latent grid.

cd amodal_completion

SEA_RAFT_ROOT=/path/to/SEA-RAFT \
SEA_RAFT_CFG=/path/to/SEA-RAFT/config/eval/spring-M.json \
SEA_RAFT_CHECKPOINT=/path/to/SEA-RAFT/checkpoint.pth \
FRAMES_ROOT=/path/to/behave/preprocessed \
WORK_ROOT=/path/to/behave/work \
CONFIG_PATH=/path/to/behave.json \
DEVICE=cuda:0 \
./scripts/run_flow_behave.sh

For InterCap, run ./scripts/run_flow_intercap.sh.

Each video produces:

WORK_ROOT/VIDEO_NAME/flow_dicts_past_future.pt

5. Run amodal completion

Run the remaining stages after optical flow is available:

cd amodal_completion

FRAMES_ROOT=/path/to/behave/preprocessed \
WORK_ROOT=/path/to/behave/work \
INPAINTED_ROOT=/path/to/behave/completed \
CONFIG_PATH=/path/to/behave.json \
SUBJECT=obj \
DEVICE=cuda:0 \
PIPELINE_STAGES="--extract-features --warp --fuse --inpaint" \
./scripts/run_behave.sh

The stages produce:

WORK_ROOT/VIDEO_NAME/
├── vae_features.npy
├── flow_dicts_past_future.pt
├── warped_feature_past_future.pt
└── obj_cross_attn_feature_past_future.pt

INPAINTED_ROOT/VIDEO_NAME/
└── 00000.inpainted.jpg

To run preprocessing and all completion stages in one command after flow has already been generated, leave PIPELINE_STAGES unset. The launcher defaults to --all.

6. Reconstruct the object

See reconstruction/README.md for full details.

The object path is:

  1. Generate img_obj_poses.json from BEHAVE poses and camera calibration.
  2. Install the provided GSPose dataset/training adapters.
  3. Run my_gs_demo.py, optionally with --use_complete to train from amodally completed RGB images and masks.
  4. Export 3DGO_model.ply.

Install the GSPose adapters:

GSPOSE_ROOT=/path/to/GSPose \
GAUSSIAN_AVATAR_ROOT=/path/to/GaussianAvatar \
./reconstruction/install_integrations.sh

Then, from the GSPose repository:

python notebook/my_gs_demo.py \
  --data-pth /path/to/sequence/img_obj_poses.json \
  --resolution 1 \
  --use_complete

7. Reconstruct the person

The human path uses GaussianAvatar:

  1. Run reconstruction/human/make_data.py to build GaussianAvatar images, masks, SMPL-H fits, and camera parameters from BEHAVE.
  2. Convert SMPL-H to SMPL with the official SMPL-X transfer tool.
  3. Run pkl_to_smpl_params.py to create smpl_parms.pth.
  4. Train GaussianAvatar according to its official README.

8. Render the reconstructed interaction

The original rendering code is retained under reconstruction/rendering/ because it combines the object-only GSPose model with the human GaussianAvatar model.

The rendering path is:

  1. Generate 30 fps SMPL and object poses with parse_obj_pose_10fps.py.
  2. Build novel-pose inputs with make_behave_novel_pose.py.
  3. Convert 3DGO_model.ply to the GaussianAvatar object tensor with my_gs_avatar_obj_save.py.
  4. Place that tensor in the novel-pose directory as object_gaussians.pth.
  5. Run my_render_novel_pose.py from the GaussianAvatar checkout.

Example:

python my_render_novel_pose.py \
  -s /path/to/avatar/training/data \
  -m /path/to/avatar/model/output \
  --epoch 180 \
  --my_test_folder /path/to/novel_pose/VIDEO_NAME/1

Path configuration

amodal_completion/main.py deliberately contains placeholder defaults:

Path("/path/to/behave/preprocessed")
Path("/path/to/behave/work")
Path("/path/to/intercap/preprocessed")
Path("/path/to/intercap/work")

Pass real paths through the launchers or command-line arguments. Do not replace these with private server paths before publishing.

Notes and limitations

  • This repository does not include datasets, checkpoints, SMPL/SMPL-X models, or code owned by the linked external repositories.
  • The experimental BEHAVE range and InterCap 16-frame truncation are preserved.
  • HDM object categories without a class-specific checkpoint use the general checkpoint and std_coverage=3.5.
  • CUDA execution is expected for HDM, SEA-RAFT, diffusion, GSPose, and GaussianAvatar.
  • Review the licenses and dataset terms of every upstream project before redistributing derived code or data.

Acknowledgements

This code builds on HDM, BEHAVE, InterCap, SEA-RAFT, Stable Diffusion, Diffusers, GSPose, GaussianAvatar, SMPL/SMPL-X, PyTorch, and PyTorch3D. Cite the corresponding papers and repositories when using their components.

About

Occlusion-Aware Temporally Consistent Amodal Completion for 3D Human-Object Interaction Reconstruction [ACM MM 2025]

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors