4-Camera Hand Motion Capture

This pipeline pairs four synchronized cameras with 2D hand pose detection (via MMPose or MediaPipe) to reconstruct 3D hand landmarks and visualize reprojection quality. Camera 0 defines the world frame; all coordinates are reported in meters relative to that camera.

Quick Start

Hardware checklist

4 cameras mounted in the diamond layout (Cam0 bottom, Cam1 top-left, Cam2 top-right, Cam3 bottom-right)
Rigid mounts with matching heights and slight inward tilt (≈10–15°)
9×6 inner-corner chessboard (square size 23 mm unless you change the scripts)
Even, diffuse lighting across the workspace

Software setup

pip install -r requirements.txt
mkdir -p video/camera video/hands

Download (or symlink) an MMDetection hand detector config/checkpoint and an MMPose top-down hand pose config/checkpoint; you will pass their paths to hand_inference.py via CLI flags.

Workflow

Record calibration videos
Place the chessboard throughout the capture volume while all four cameras record simultaneously. Save as video/camera/cam0.mp4 … cam3.mp4.
Run calibration
```
python calibration.py
```
Produces output/calibration/multi_camera_calib.npz and multi_camera_rectify.npz. Target per-camera RMS < 0.5 px and stereo RMS < 1.0 px.
Record hand-motion videos
Capture synchronized hand footage and store in video/hand/0.mp4 … 3.mp4 (or pass --sequence to change the folder).
Run 2D hand inference
Pick the variant that best fits your use case:
- MMPose (default / highest accuracy)
```
python hand_inference.py \
    --det-config models/det/rtmdet_tiny_8xb32-300e_coco.py \
    --det-checkpoint models/det/rtmdet_tiny_8xb32-300e_coco_20220902_112414-78e30dcc.pth \
    --pose-config models/pose/rtmpose-m_8xb256-210e_hand5-256x256.py \
    --pose-checkpoint models/pose/rtmpose-m_simcc-hand5_pt-aic-coco_210e-256x256-74fb594_20230320.pth \
    --device cuda:0 \
    --sequence hand
```
  hand_inference.py is a thin wrapper around hand_inference_mmpose.py, so you can call either script with the same flags.
- MediaPipe (no configs/checkpoints required)
```
python hand_inference_mediapipe.py --sequence hand
```
  Useful for quick tests on CPU-only machines. Produces the same pickle format, so downstream steps remain unchanged.
Both variants write cached detections to output/detections/<sequence>_2d_detections.pkl. During inference with --preview, both show real-time visualization of detected hand skeletons (21 keypoints + connections) overlaid on each camera view.

Triangulate 3D hands

python hand_triangulation.py \
    --detections output/detections/hand_2d_detections.pkl \
    --display

Writes multi- and single-hand 3D trajectories under output/tracking/.

Evaluate results (optional but recommended)
```
python evaluate.py
python check_hand_consistency.py
```
Generates a reprojection video and diagnostic plots under output/evaluation/ and output/visualization/.
Calibrate quality (optional)
python checkerboard_eval.py summarizes checkerboard reprojection errors for sanity checking.

Hand detector / pose configs (`hand_inference*.py`)

--det-config / --det-checkpoint: MMDetection hand detector (e.g., RTMDet hand). Set --det-cat-id if your detector uses a different class index (default 0).
--pose-config / --pose-checkpoint: MMPose top-down hand pose model (e.g., RTMPose). Make sure the model predicts 21 keypoints that follow the MediaPipe ordering.
--device: cpu, cuda:0, etc. Defaults to cuda:0 if available, otherwise cpu.
Optional --det-score-thr / --pose-score-thr tune per-camera detection filtering; --max-hands-per-view limits per-camera tracking.
Use --sequence to target a different video/<sequence>/cam.mp4 folder, and --output to rename the cached detection pickle.
MediaPipe variant ignores detector/pose config flags; tune its behavior via --det-score-thr, --pose-score-thr, --max-hands-per-view, and --max-frames.

Project Layout

Hand_MoCap/
├── calibration.py                      # Step 1: chessboard-based calibration
├── hand_inference_mmpose.py            # Step 2a: MMPose detection + pose estimation
├── hand_inference_mediapipe.py         # Step 2b: MediaPipe detection + pose estimation
├── hand_triangulation.py               # Step 3: multi-view matching & 3D reconstruction
├── evaluate.py                         # Step 4: reprojection video generation
├── checkerboard_eval.py                # Optional: calibration quality diagnostics
├── video_utils.py                      # Utility: video discovery helpers
├── video/                              # Input footage (camera + hand recordings)
│   ├── camera/                         # Calibration videos
│   └── hand/                           # Hand motion videos
├── output/                             # Generated artifacts (auto-created)
│   ├── calibration/                    # Calibration parameters
│   ├── detections/                     # Cached 2D detections
│   ├── tracking/                       # 3D hand trajectories
│   └── evaluation/                     # Reprojection videos
├── models/                             # MMPose/MMDetection model files
│   ├── det/                            # Detection model configs/checkpoints
│   └── pose/                           # Pose model configs/checkpoints
└── requirements.txt                    # Python dependencies

Output directories

output/calibration/
- multi_camera_calib.npz – intrinsics (K0–K3), distortion (D0–D3), rotations (R1–R3), and translations (T1–T3) expressed from camera 0
- multi_camera_rectify.npz – rectification transforms (R1_01, P2_03, Q_02, …) for stereo matching
output/detections/
- <sequence>_2d_detections.pkl – cached per-frame, per-camera 2D keypoints produced by hand_inference.py
output/tracking/
- hand_3d_positions_multi.pkl – list of frames; each frame contains 0–2 hands with (21, 3) arrays in meters
- hand_3d_positions.npy – single-hand array (first hand per frame) with NaNs when no hand is present
output/evaluation/
- reprojection_4cam.mp4 – 2×2 grid showing original footage with reprojected landmarks
output/visualization/
- hand_consistency_check.png – coverage, motion, and ID-consistency plots

Delete output/ to reset the workspace; scripts recreate folders as needed.

Camera Setup Guidance

Top view (diamond layout):

    Cam1          Cam2
      \            /
       \          /
   45°  \        /  45°
         \      /
      [Hand Workspace]
         /      \
   45°  /        \  45°
       /          \
      /            \
    Cam0          Cam3

Positioning: keep all cameras ~0.5 m from the workspace center at a common height (~30 cm above the surface) and tilt inward by ~10–15°.
Baselines: expect ~0.7 m between adjacent cameras; Camera 1↔3 forms the widest pair (~1 m) and provides strong depth cues.
Coverage: the most reliable capture volume is a 20 cm cube at the center; quality remains good out to ≈35 cm before dropping to two-camera coverage near the edges.

Setup checklist

Mounts are rigid and heights match
Lighting is uniform with minimal glare or shadows
Cameras share the same resolution and frame rate (≥30 FPS)
Auto-exposure/white balance are consistent or locked
Recording start times are tightly synchronized (<100 ms skew)

Calibration & Tracking Details

`calibration.py`

Defaults: chessboard_size = (9, 6) inner corners, square_size = 0.023 m.
Samples every sample_every_n_frames (default 50) up to max_frames sets.
Prints per-camera RMS and stereo RMS errors plus baselines. Recalibrate if RMS > 1.0 px or baselines are inconsistent.

`hand_inference_mmpose.py` / `hand_inference_mediapipe.py`

Runs 2D hand detection per camera view to cache 2D joints for every frame in a synchronized sequence.
MMPose variant: Uses MMDetection + MMPose pipeline; accepts --det-config, --pose-config, --device, and related flags.
MediaPipe variant: CPU-friendly alternative requiring no model downloads; tune via --det-score-thr and --pose-score-thr.
Both support --preview for real-time skeleton visualization (green lines + keypoints) and --sequence to select input folder.
Writes a pickle containing per-frame, per-camera detections (keypoints + confidences) under output/detections/.

`hand_triangulation.py`

Consumes the cached detections, camera calibration, and (optionally) the raw videos to match hands across views and triangulate them.
Performs bundle-adjusted triangulation with per-landmark outlier rejection; enable --debug-matching for verbose pairing logs and --display for reprojected overlays.
Saves both multi-hand (pickle) and single-hand (NumPy) trajectories under output/tracking/.

`evaluate.py`

Reprojects tracked 3D landmarks into all cameras to visually validate alignment.
Video layout is Cam0 | Cam1 over Cam2 | Cam3; green landmarks mark the first hand, magenta the second.

`check_hand_consistency.py`

Aggregates statistics such as per-frame hand counts, wrist trajectories, and inter-hand distances to spot ID swaps or dropouts.

Data Formats

multi_camera_calib.npz
Load with np.load, access K*, D*, R*, T*, E*, F*. Rotations/Translations map camera 0 coordinates into other camera frames (P_i = R_i @ P_0 + T_i).

hand_3d_positions_multi.pkl

import pickle
with open("output/tracking/hand_3d_positions_multi.pkl", "rb") as f:
    frames = pickle.load(f)
# frames[frame_idx][hand_idx][landmark_idx] -> (x, y, z) in meters

Landmarks follow MediaPipe ordering (0 wrist, 4 thumb tip, 8 index tip, 12 middle tip, 16 ring tip, 20 pinky tip).

hand_3d_positions.npy
Shape (num_frames, 21, 3); NaN rows indicate frames without a detected hand.

Troubleshooting & Tips

Chessboard not detected: improve lighting, slow down board motion, confirm chessboard_size/square_size match the physical board.
High calibration error: capture more diverse poses (cover corners and tilt angles), ensure cameras remain fixed, clean lenses.
Hands appear gray or unmatched: check synchronization, lighting balance, and detection thresholds in hand_inference.py (--det-score-thr, --pose-score-thr).
Jittery trajectories: recalibrate, verify camera mounts, or apply temporal smoothing to the exported data.
Large reprojection error: re-run checkerboard_eval.py to locate problematic frames/cameras; recalibrate if mean error exceeds a few pixels.

Requirements & Performance

Python 3.8+ with PyTorch, OpenCV, NumPy, and Matplotlib (see requirements.txt).
For MMPose: Requires MMPose, MMDetection, MMEngine, and MMCV packages plus model checkpoints.
For MediaPipe: Only requires the mediapipe package (installed via requirements.txt).
Typical run times on a modern GPU laptop: calibration ≈1 min (25 frames), tracking 5–10 FPS for 4 cameras, evaluation ≈30 FPS for rendering. CPU-only inference works but is significantly slower.

Next Steps

Use the pickle output for gesture recognition, biomechanics analysis, or downstream machine learning.
Tune detection settings via hand_inference.py flags (score thresholds, max hands) and triangulation heuristics with hand_triangulation.py (--max-hands-total, --reproj-rejection) to balance robustness and speed.
Extend the evaluator or diagnostics scripts to suit your application (e.g., export CSV summaries or integrate temporal filters).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

4-Camera Hand Motion Capture

Quick Start

Hardware checklist

Software setup

Workflow

Hand detector / pose configs (`hand_inference*.py`)

Project Layout

Output directories

Camera Setup Guidance

Setup checklist

Calibration & Tracking Details

`calibration.py`

`hand_inference_mmpose.py` / `hand_inference_mediapipe.py`

`hand_triangulation.py`

`evaluate.py`

`check_hand_consistency.py`

Data Formats

Troubleshooting & Tips

Requirements & Performance

Next Steps

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

4-Camera Hand Motion Capture

Quick Start

Hardware checklist

Software setup

Workflow

Hand detector / pose configs (hand_inference*.py)

Project Layout

Output directories

Camera Setup Guidance

Setup checklist

Calibration & Tracking Details

calibration.py

hand_inference_mmpose.py / hand_inference_mediapipe.py

hand_triangulation.py

evaluate.py

check_hand_consistency.py

Data Formats

Troubleshooting & Tips

Requirements & Performance

Next Steps

Hand detector / pose configs (`hand_inference*.py`)

`calibration.py`

`hand_inference_mmpose.py` / `hand_inference_mediapipe.py`

`hand_triangulation.py`

`evaluate.py`

`check_hand_consistency.py`