Christen Millerdurai1, Shaoxiang Wang1,2, Yaxu Xie1, Vladislav Golyanik3, Didier Stricker1,2, Alain Pagani1
1German Research Center for Artificial Intelligence (DFKI) | 2Rhineland-Palatinate Technical University of Kaiserslautern-Landau (RPTU) | 3Max Planck Institute for Informatics (MPII)
ACM SIGGRAPH Conference Proceedings, 2026
Project Page | arXiv | Code | Data | Demo
Reconstructing the absolute 3D pose and shape of the hands from the user’s viewpoint using a single head-mounted camera is crucial for practical egocen- tric interaction in AR/VR, telepresence, and hand-centric manipulation tasks, where sensing must remain compact and unobtrusive. While monocular RGB methods have made progress, they remain constrained by depth–scale am- biguity and struggle to generalize across the diverse optical configurations of head-mounted devices. As a result, models typically require extensive training on device-specific datasets, which are costly and laborious to ac- quire. This paper addresses these challenges by introducing EgoForce , a monocular 3D hand reconstruction framework that recovers robust, absolute 3D hand pose and its position from the user’s (camera-space) viewpoint. EgoForce operates across fisheye, perspective, and distorted wide-FOV camera models using a single unified network. Our approach combines a differentiable forearm representation that stabilizes hand pose, a unified arm–hand transformer that predicts both hand and forearm geometry from a single egocentric view, mitigating depth–scale ambiguity, and a ray space closed-form solver that enables absolute 3D pose recovery across diverse head-mounted camera models. Experiments on three egocentric benchmarks show that EgoForce achieves state-of-the-art 3D accuracy, reducing camera- space MPJPE by up to 28% on the HOT3D dataset compared to prior methods and maintaining consistent performance across camera configurations.
EgoForce processes a monocular egocentric RGB frame by extracting hand and forearm crops, tokenizing them, and conditioning the features on crop intrinsics (CIT). A transformer jointly infers hand–arm features to predict 2D keypoints (with confidences) and root-relative 3D hand and arm poses, which are lifted to camera-space meshes via the ray space solver. When the forearm is out of view, arm tokens are replaced with missing-arm tokens, and a hand-conditioned variational prior infers a plausible arm representation. We apply this workflow independently to the left and right hand-forearm crops.
The install script targets a Conda environment named egoforce and installs the CUDA 12.6, PyTorch 2.8, TensorRT, MMCV, AnyCalib, PyTorch3D, and Project Aria dependencies used by the repo.
conda create -n egoforce python=3.10 -y
conda activate egoforce
bash scripts/install.shThe model weights, detector checkpoints, MANO files, and demo assets expected by settings.py live under the repo-local _DATA/ directory.
bash scripts/download_model_weights.shBy default, the main checkpoint path is settings.py:
config.POSE_3D.CHECKPOINT_PATH = os.path.join(_DATA_DIR, 'model_weights.pth')The dataset downloader clones the Hugging Face dataset repo with git-lfs and writes it to:
<data-root>/EgoForce
You must pass the destination explicitly:
bash scripts/download_datasets.sh --data-root /path/to/datasetsAfter download, update settings.py so config.DATASET.DIR points to your dataset root with a trailing slash, for example:
config.DATASET.DIR = "/path/to/datasets/"The repo then resolves the dataset folders as:
EgoForce/HOT3DEgoForce/ARCTICEgoForce/H2O
Before running experiments, make sure these paths exist:
- Data root:
_DATA/ - datasets root:
config.DATASET.DIR + "EgoForce/..."
The main entrypoint is experiments/save_predictions.py. It runs EgoForce on a dataset split and saves a pickle file under _DATA/predictions/.
Supported datasets are:
ARCTICH2OHO3DHOT3DHOT3D_PINHOLEHOT3D_EQUISOLIDHOT3D_EQUIRECTANGULARHOT3D_STEREOGRAPHIC
Example:
python experiments/save_predictions.py \
--test-dataset-name ARCTIC \
--checkpoint-path _DATA/model_weights.pthCommon ablation and variant flags:
--no-undistort-inp--no-cit--no-arm-prior--no-arm-input--anycalib-624--anycalib-pin--depth-model--dgp-model
Prediction files are written as:
_DATA/predictions/<DATASET>_<suffix>_predictions.pkl
experiments/evaluate_predictions.py reads the saved prediction PKLs, applies the matching suffix logic, and writes evaluation summaries under results/OURS/.
Example:
python experiments/evaluate_predictions.py \
--test-dataset-name ARCTICIf you evaluated a specific variant, pass the same flags used during prediction generation so the script resolves the correct suffix:
python experiments/evaluate_predictions.py \
--test-dataset-name HOT3D \
--no-citUseful options:
--disable-kalman-filterdisables translation smoothing. Kalman filtering is enabled by default.--results-root <dir>changes the output root fromresults/.
experiments/save_noisy_intrinsic_predictions.py runs a HOT3D-only camera-noise sweep. It first estimates first-frame AnyCalib intrinsics, then evaluates multiple intrinsic noise levels and stores both prediction caches and camera-noise analysis artifacts.
python experiments/save_noisy_intrinsic_predictions.pyOptional controls:
--no-cit--ray-grid-size--radial-bins--force-recompute--noisy-predictions-dir <dir>
This script writes noisy prediction PKLs, camera-noise analysis PKLs, AnyCalib intrinsics JSON files, and plots under _DATA/noisy_predictions/.
To aggregate the robustness results, run experiments/evaluate_noisy_intrinsic_predictions.py:
python experiments/evaluate_noisy_intrinsic_predictions.pyThe default output directory is:
results/intrinsics_robustness
experiments/evaluate_hand_scale.py evaluates hand-scale consistency and calibration behavior from prediction PKLs. It can auto-discover predictions under _DATA/predictions/ by suffix, or you can pass files explicitly.
Auto-discovery example:
python experiments/evaluate_hand_scale.py --suffix undistort_inp_trueExplicit-file example:
python experiments/evaluate_hand_scale.py \
--hot3d-predictions _DATA/predictions/HOT3D_undistort_inp_true_predictions.pkl \
--arctic-predictions _DATA/predictions/ARCTIC_undistort_inp_true_predictions.pklBy default, the script writes CSV summaries, plots, and a text report to:
results/hand_scale_eval/<suffix>/
experiments/hand_joint_occlusion_graph.py compares ARCTIC predictions with and without forearm input, grouped by hand-joint visibility.
It expects these two prediction files to exist in _DATA/predictions/:
ARCTIC_undistort_inp_true_predictions.pklARCTIC_undistort_inp_true_no_arm_input_predictions.pkl
Run:
python experiments/hand_joint_occlusion_graph.pyArtifacts are written under:
results/hand_joint_occlusion_graph/
The Gradio app in demo/run_app.py runs EgoForce on uploaded videos and shows the output video with the input view, ego-view render, and third-person render.
Start it with:
python demo/run_app.pyUseful launch options:
python demo/run_app.py --server-name 0.0.0.0 --server-port 7860
python demo/run_app.py --shareThe live Aria demo in demo/run_aria.py streams RGB frames from a Project Aria device over USB and runs inference frame by frame.
Run:
python demo/run_aria.pyNotes:
- the streaming config in
run_aria.pyuses USB and ephemeral certificates - Check project aria documentation for more details on device setup.
If you find this code useful for your research, please cite our paper:
@inproceedings{millerdurai2026egoforce,
title={EgoForce: Forearm-Guided Camera-Space 3D Hand Pose from a Monocular Egocentric Camera},
author={Millerdurai, Christen and Wang, Shaoxiang and Xie, Yaxu and Golyanik, Vladislav and Stricker, Didier and Pagani, Alain},
booktitle={Proceedings of the SIGGRAPH 2026 Conference Papers},
year={2026}
}
EgoForce is under CC-BY-NC 4.0 license. The license also applies to the pre-trained models.



