Occlusion-Aware Temporally Consistent Amodal Completion for 3D Human-Object Interaction Reconstruction [ACM MM 2025]
Hyungjun Doh1*, Dong In Lee1,2*, Seunggeun Chi1*, Pin-Hao Huang3, Kwonjoon Lee3,
Sangpil Kim2†, Karthik Ramani1†
1Purdue University, 2Korea University, 3Honda Research Institute USA
This repository contains the experiment pipeline for producing temporally consistent amodal RGB completions of an interacting person or object, followed by optional 3D Gaussian reconstruction and joint human-object rendering.
The workflow supports BEHAVE and InterCap. It integrates external research repositories instead of redistributing their full source code:
- HDM for template-free human-object point cloud prediction.
- BEHAVE dataset tools for BEHAVE sequence, calibration, mask, and pose access.
- InterCap for InterCap data and annotations.
- SEA-RAFT for optical flow.
- GSPose for object 3D Gaussian reconstruction.
- GaussianAvatar for human Gaussian reconstruction and final rendering.
.
├── preprocessing/
│ ├── README.md
│ └── validate_layout.py
├── hdm/
│ ├── project_masks.py
│ ├── alpha_shape_masks.py
│ ├── run_hdm.py
│ └── run_hdm.sh
├── amodal_completion/
│ ├── configs/
│ ├── scripts/
│ ├── main.py
│ ├── sea_raft_flow.py
│ ├── utils.py
│ └── inpainting.py
└── reconstruction/
├── object/
├── human/
├── rendering/
└── install_integrations.sh
The complete processing order is:
dataset video and annotations
|
v
common per-frame layout
|
v
HDM point prediction -> projected dots -> alpha-shape concave masks
|
v
mask fusion and 512 x 512 preprocessing
|
v
SEA-RAFT temporal optical flow
|
v
VAE feature extraction -> flow warping -> temporal attention fusion
|
v
Stable Diffusion amodal completion
|
+--> GSPose object reconstruction
+--> GaussianAvatar human reconstruction
+--> combined human-object rendering
The complete workflow should not be installed in one Python environment. HDM, SEA-RAFT, GSPose, GaussianAvatar, and dataset loaders have different PyTorch, CUDA, and PyTorch3D requirements.
Recommended environments:
hdm: use the versions documented by HDM.amodal: use Python 3.10 and install this repository's pinnedrequirements.txt. If the pinned PyTorch wheel does not match the server's CUDA driver, install the correspondingtorch==2.4.1andtorchvision==0.19.1wheels from the official PyTorch index first.sea-raft: use the official SEA-RAFT environment.gspose: use the official GSPose environment.gaussian-avatar: use the official GaussianAvatar environment.behaveorintercap: use the official dataset repository environment for dataset conversion and pose extraction.
Do not commit dataset files, model checkpoints, generated masks, latent features, optical flow, completed images, or reconstructed models.
Follow preprocessing/README.md to export BEHAVE or InterCap frames. Both datasets must be converted to this common layout:
/path/to/preprocessed/
└── VIDEO_NAME/
├── 00000/
│ ├── 00000.color.jpg
│ ├── 00000.obj_mask.png
│ ├── 00000.person_mask.png
│ └── HDM_mask/
├── 00001/
└── ...
Create a JSON manifest:
{
"video_folders": [
{
"folder": "VIDEO_NAME",
"prompt": "object description"
}
]
}Example manifests are provided in amodal_completion/configs/.
BEHAVE uses frames [90, 1497) when available. InterCap is truncated to the
largest multiple of 16 frames.
Set up the official HDM repository first. The wrapper copies
hdm/project_masks.py into that checkout and runs it with the official HDM
configuration and model code.
HDM_ROOT=/path/to/HDM \
FRAMES_ROOT=/path/to/preprocessed \
CONFIG_PATH=/path/to/dataset.json \
GPU_ID=0 \
./hdm/run_hdm.shFor each frame, HDM predicts human and object point clouds. The wrapper projects the two point sets into the input camera and writes:
HDM_mask/00000_obj_pj_mask.png
HDM_mask/00000_human_pj_mask.png
The same command then uses a 2D alpha shape to connect the projected pixels:
HDM_mask/00000_obj_concave_mask.png
HDM_mask/00000_human_concave_mask.png
The default alpha value is 0.07. A denser or noisier projection may require a
different --alpha value.
Validate the resulting dataset:
python preprocessing/validate_layout.py \
--frames-root /path/to/preprocessed \
--config /path/to/dataset.jsonThe preprocessing stage:
- unions the visible segmentation with the HDM concave mask;
- computes one sequence-level crop;
- resizes RGB and masks to
512 x 512; - creates object-only and person-only RGB inputs;
- computes the occluded region to inpaint.
For BEHAVE:
cd amodal_completion
FRAMES_ROOT=/path/to/behave/preprocessed \
WORK_ROOT=/path/to/behave/work \
INPAINTED_ROOT=/path/to/behave/completed \
CONFIG_PATH=/path/to/behave.json \
SUBJECT=obj \
PIPELINE_STAGES="--preprocess" \
./scripts/run_behave.shFor InterCap, use ./scripts/run_intercap.sh with the corresponding paths.
SUBJECT may be obj or person.
Set up SEA-RAFT and select its config and checkpoint. Optical flow is computed
between each current frame and seven sampled past plus seven sampled future
frames. Forward and backward flows are resized to the 64 x 64 latent grid.
cd amodal_completion
SEA_RAFT_ROOT=/path/to/SEA-RAFT \
SEA_RAFT_CFG=/path/to/SEA-RAFT/config/eval/spring-M.json \
SEA_RAFT_CHECKPOINT=/path/to/SEA-RAFT/checkpoint.pth \
FRAMES_ROOT=/path/to/behave/preprocessed \
WORK_ROOT=/path/to/behave/work \
CONFIG_PATH=/path/to/behave.json \
DEVICE=cuda:0 \
./scripts/run_flow_behave.shFor InterCap, run ./scripts/run_flow_intercap.sh.
Each video produces:
WORK_ROOT/VIDEO_NAME/flow_dicts_past_future.pt
Run the remaining stages after optical flow is available:
cd amodal_completion
FRAMES_ROOT=/path/to/behave/preprocessed \
WORK_ROOT=/path/to/behave/work \
INPAINTED_ROOT=/path/to/behave/completed \
CONFIG_PATH=/path/to/behave.json \
SUBJECT=obj \
DEVICE=cuda:0 \
PIPELINE_STAGES="--extract-features --warp --fuse --inpaint" \
./scripts/run_behave.shThe stages produce:
WORK_ROOT/VIDEO_NAME/
├── vae_features.npy
├── flow_dicts_past_future.pt
├── warped_feature_past_future.pt
└── obj_cross_attn_feature_past_future.pt
INPAINTED_ROOT/VIDEO_NAME/
└── 00000.inpainted.jpg
To run preprocessing and all completion stages in one command after flow has
already been generated, leave PIPELINE_STAGES unset. The launcher defaults to
--all.
See reconstruction/README.md for full details.
The object path is:
- Generate
img_obj_poses.jsonfrom BEHAVE poses and camera calibration. - Install the provided GSPose dataset/training adapters.
- Run
my_gs_demo.py, optionally with--use_completeto train from amodally completed RGB images and masks. - Export
3DGO_model.ply.
Install the GSPose adapters:
GSPOSE_ROOT=/path/to/GSPose \
GAUSSIAN_AVATAR_ROOT=/path/to/GaussianAvatar \
./reconstruction/install_integrations.shThen, from the GSPose repository:
python notebook/my_gs_demo.py \
--data-pth /path/to/sequence/img_obj_poses.json \
--resolution 1 \
--use_completeThe human path uses GaussianAvatar:
- Run
reconstruction/human/make_data.pyto build GaussianAvatar images, masks, SMPL-H fits, and camera parameters from BEHAVE. - Convert SMPL-H to SMPL with the official SMPL-X transfer tool.
- Run
pkl_to_smpl_params.pyto createsmpl_parms.pth. - Train GaussianAvatar according to its official README.
The original rendering code is retained under reconstruction/rendering/
because it combines the object-only GSPose model with the human GaussianAvatar
model.
The rendering path is:
- Generate 30 fps SMPL and object poses with
parse_obj_pose_10fps.py. - Build novel-pose inputs with
make_behave_novel_pose.py. - Convert
3DGO_model.plyto the GaussianAvatar object tensor withmy_gs_avatar_obj_save.py. - Place that tensor in the novel-pose directory as
object_gaussians.pth. - Run
my_render_novel_pose.pyfrom the GaussianAvatar checkout.
Example:
python my_render_novel_pose.py \
-s /path/to/avatar/training/data \
-m /path/to/avatar/model/output \
--epoch 180 \
--my_test_folder /path/to/novel_pose/VIDEO_NAME/1amodal_completion/main.py deliberately contains placeholder defaults:
Path("/path/to/behave/preprocessed")
Path("/path/to/behave/work")
Path("/path/to/intercap/preprocessed")
Path("/path/to/intercap/work")Pass real paths through the launchers or command-line arguments. Do not replace these with private server paths before publishing.
- This repository does not include datasets, checkpoints, SMPL/SMPL-X models, or code owned by the linked external repositories.
- The experimental BEHAVE range and InterCap 16-frame truncation are preserved.
- HDM object categories without a class-specific checkpoint use the general
checkpoint and
std_coverage=3.5. - CUDA execution is expected for HDM, SEA-RAFT, diffusion, GSPose, and GaussianAvatar.
- Review the licenses and dataset terms of every upstream project before redistributing derived code or data.
This code builds on HDM, BEHAVE, InterCap, SEA-RAFT, Stable Diffusion, Diffusers, GSPose, GaussianAvatar, SMPL/SMPL-X, PyTorch, and PyTorch3D. Cite the corresponding papers and repositories when using their components.