Skip to content

Latest commit

 

History

History
207 lines (174 loc) · 11.9 KB

File metadata and controls

207 lines (174 loc) · 11.9 KB

4-Camera Hand Motion Capture

This pipeline pairs four synchronized cameras with 2D hand pose detection (via MMPose or MediaPipe) to reconstruct 3D hand landmarks and visualize reprojection quality. Camera 0 defines the world frame; all coordinates are reported in meters relative to that camera.

Quick Start

Hardware checklist

  • 4 cameras mounted in the diamond layout (Cam0 bottom, Cam1 top-left, Cam2 top-right, Cam3 bottom-right)
  • Rigid mounts with matching heights and slight inward tilt (≈10–15°)
  • 9×6 inner-corner chessboard (square size 23 mm unless you change the scripts)
  • Even, diffuse lighting across the workspace

Software setup

pip install -r requirements.txt
mkdir -p video/camera video/hands

Download (or symlink) an MMDetection hand detector config/checkpoint and an MMPose top-down hand pose config/checkpoint; you will pass their paths to hand_inference.py via CLI flags.

Workflow

  1. Record calibration videos
    Place the chessboard throughout the capture volume while all four cameras record simultaneously. Save as video/camera/cam0.mp4cam3.mp4.

  2. Run calibration

    python calibration.py

    Produces output/calibration/multi_camera_calib.npz and multi_camera_rectify.npz. Target per-camera RMS < 0.5 px and stereo RMS < 1.0 px.

  3. Record hand-motion videos
    Capture synchronized hand footage and store in video/hand/0.mp43.mp4 (or pass --sequence to change the folder).

  4. Run 2D hand inference
    Pick the variant that best fits your use case:

    • MMPose (default / highest accuracy)
      python hand_inference.py \
          --det-config models/det/rtmdet_tiny_8xb32-300e_coco.py \
          --det-checkpoint models/det/rtmdet_tiny_8xb32-300e_coco_20220902_112414-78e30dcc.pth \
          --pose-config models/pose/rtmpose-m_8xb256-210e_hand5-256x256.py \
          --pose-checkpoint models/pose/rtmpose-m_simcc-hand5_pt-aic-coco_210e-256x256-74fb594_20230320.pth \
          --device cuda:0 \
          --sequence hand
      hand_inference.py is a thin wrapper around hand_inference_mmpose.py, so you can call either script with the same flags.
    • MediaPipe (no configs/checkpoints required)
      python hand_inference_mediapipe.py --sequence hand
      Useful for quick tests on CPU-only machines. Produces the same pickle format, so downstream steps remain unchanged.

    Both variants write cached detections to output/detections/<sequence>_2d_detections.pkl. During inference with --preview, both show real-time visualization of detected hand skeletons (21 keypoints + connections) overlaid on each camera view.

  5. Triangulate 3D hands

    python hand_triangulation.py \
        --detections output/detections/hand_2d_detections.pkl \
        --display

    Writes multi- and single-hand 3D trajectories under output/tracking/.

  6. Evaluate results (optional but recommended)

    python evaluate.py
    python check_hand_consistency.py

    Generates a reprojection video and diagnostic plots under output/evaluation/ and output/visualization/.

  7. Calibrate quality (optional)
    python checkerboard_eval.py summarizes checkerboard reprojection errors for sanity checking.

Hand detector / pose configs (hand_inference*.py)

  • --det-config / --det-checkpoint: MMDetection hand detector (e.g., RTMDet hand). Set --det-cat-id if your detector uses a different class index (default 0).
  • --pose-config / --pose-checkpoint: MMPose top-down hand pose model (e.g., RTMPose). Make sure the model predicts 21 keypoints that follow the MediaPipe ordering.
  • --device: cpu, cuda:0, etc. Defaults to cuda:0 if available, otherwise cpu.
  • Optional --det-score-thr / --pose-score-thr tune per-camera detection filtering; --max-hands-per-view limits per-camera tracking.
  • Use --sequence to target a different video/<sequence>/cam.mp4 folder, and --output to rename the cached detection pickle.
  • MediaPipe variant ignores detector/pose config flags; tune its behavior via --det-score-thr, --pose-score-thr, --max-hands-per-view, and --max-frames.

Project Layout

Hand_MoCap/
├── calibration.py                      # Step 1: chessboard-based calibration
├── hand_inference_mmpose.py            # Step 2a: MMPose detection + pose estimation
├── hand_inference_mediapipe.py         # Step 2b: MediaPipe detection + pose estimation
├── hand_triangulation.py               # Step 3: multi-view matching & 3D reconstruction
├── evaluate.py                         # Step 4: reprojection video generation
├── checkerboard_eval.py                # Optional: calibration quality diagnostics
├── video_utils.py                      # Utility: video discovery helpers
├── video/                              # Input footage (camera + hand recordings)
│   ├── camera/                         # Calibration videos
│   └── hand/                           # Hand motion videos
├── output/                             # Generated artifacts (auto-created)
│   ├── calibration/                    # Calibration parameters
│   ├── detections/                     # Cached 2D detections
│   ├── tracking/                       # 3D hand trajectories
│   └── evaluation/                     # Reprojection videos
├── models/                             # MMPose/MMDetection model files
│   ├── det/                            # Detection model configs/checkpoints
│   └── pose/                           # Pose model configs/checkpoints
└── requirements.txt                    # Python dependencies

Output directories

  • output/calibration/
    • multi_camera_calib.npz – intrinsics (K0–K3), distortion (D0–D3), rotations (R1–R3), and translations (T1–T3) expressed from camera 0
    • multi_camera_rectify.npz – rectification transforms (R1_01, P2_03, Q_02, …) for stereo matching
  • output/detections/
    • <sequence>_2d_detections.pkl – cached per-frame, per-camera 2D keypoints produced by hand_inference.py
  • output/tracking/
    • hand_3d_positions_multi.pkl – list of frames; each frame contains 0–2 hands with (21, 3) arrays in meters
    • hand_3d_positions.npy – single-hand array (first hand per frame) with NaNs when no hand is present
  • output/evaluation/
    • reprojection_4cam.mp4 – 2×2 grid showing original footage with reprojected landmarks
  • output/visualization/
    • hand_consistency_check.png – coverage, motion, and ID-consistency plots

Delete output/ to reset the workspace; scripts recreate folders as needed.

Camera Setup Guidance

Top view (diamond layout):

    Cam1          Cam2
      \            /
       \          /
   45°  \        /  45°
         \      /
      [Hand Workspace]
         /      \
   45°  /        \  45°
       /          \
      /            \
    Cam0          Cam3
  • Positioning: keep all cameras ~0.5 m from the workspace center at a common height (~30 cm above the surface) and tilt inward by ~10–15°.
  • Baselines: expect ~0.7 m between adjacent cameras; Camera 1↔3 forms the widest pair (~1 m) and provides strong depth cues.
  • Coverage: the most reliable capture volume is a 20 cm cube at the center; quality remains good out to ≈35 cm before dropping to two-camera coverage near the edges.

Setup checklist

  • Mounts are rigid and heights match
  • Lighting is uniform with minimal glare or shadows
  • Cameras share the same resolution and frame rate (≥30 FPS)
  • Auto-exposure/white balance are consistent or locked
  • Recording start times are tightly synchronized (<100 ms skew)

Calibration & Tracking Details

calibration.py

  • Defaults: chessboard_size = (9, 6) inner corners, square_size = 0.023 m.
  • Samples every sample_every_n_frames (default 50) up to max_frames sets.
  • Prints per-camera RMS and stereo RMS errors plus baselines. Recalibrate if RMS > 1.0 px or baselines are inconsistent.

hand_inference_mmpose.py / hand_inference_mediapipe.py

  • Runs 2D hand detection per camera view to cache 2D joints for every frame in a synchronized sequence.
  • MMPose variant: Uses MMDetection + MMPose pipeline; accepts --det-config, --pose-config, --device, and related flags.
  • MediaPipe variant: CPU-friendly alternative requiring no model downloads; tune via --det-score-thr and --pose-score-thr.
  • Both support --preview for real-time skeleton visualization (green lines + keypoints) and --sequence to select input folder.
  • Writes a pickle containing per-frame, per-camera detections (keypoints + confidences) under output/detections/.

hand_triangulation.py

  • Consumes the cached detections, camera calibration, and (optionally) the raw videos to match hands across views and triangulate them.
  • Performs bundle-adjusted triangulation with per-landmark outlier rejection; enable --debug-matching for verbose pairing logs and --display for reprojected overlays.
  • Saves both multi-hand (pickle) and single-hand (NumPy) trajectories under output/tracking/.

evaluate.py

  • Reprojects tracked 3D landmarks into all cameras to visually validate alignment.
  • Video layout is Cam0 | Cam1 over Cam2 | Cam3; green landmarks mark the first hand, magenta the second.

check_hand_consistency.py

  • Aggregates statistics such as per-frame hand counts, wrist trajectories, and inter-hand distances to spot ID swaps or dropouts.

Data Formats

  • multi_camera_calib.npz
    Load with np.load, access K*, D*, R*, T*, E*, F*. Rotations/Translations map camera 0 coordinates into other camera frames (P_i = R_i @ P_0 + T_i).

  • hand_3d_positions_multi.pkl

    import pickle
    with open("output/tracking/hand_3d_positions_multi.pkl", "rb") as f:
        frames = pickle.load(f)
    # frames[frame_idx][hand_idx][landmark_idx] -> (x, y, z) in meters

    Landmarks follow MediaPipe ordering (0 wrist, 4 thumb tip, 8 index tip, 12 middle tip, 16 ring tip, 20 pinky tip).

  • hand_3d_positions.npy
    Shape (num_frames, 21, 3); NaN rows indicate frames without a detected hand.

Troubleshooting & Tips

  • Chessboard not detected: improve lighting, slow down board motion, confirm chessboard_size/square_size match the physical board.
  • High calibration error: capture more diverse poses (cover corners and tilt angles), ensure cameras remain fixed, clean lenses.
  • Hands appear gray or unmatched: check synchronization, lighting balance, and detection thresholds in hand_inference.py (--det-score-thr, --pose-score-thr).
  • Jittery trajectories: recalibrate, verify camera mounts, or apply temporal smoothing to the exported data.
  • Large reprojection error: re-run checkerboard_eval.py to locate problematic frames/cameras; recalibrate if mean error exceeds a few pixels.

Requirements & Performance

  • Python 3.8+ with PyTorch, OpenCV, NumPy, and Matplotlib (see requirements.txt).
  • For MMPose: Requires MMPose, MMDetection, MMEngine, and MMCV packages plus model checkpoints.
  • For MediaPipe: Only requires the mediapipe package (installed via requirements.txt).
  • Typical run times on a modern GPU laptop: calibration ≈1 min (25 frames), tracking 5–10 FPS for 4 cameras, evaluation ≈30 FPS for rendering. CPU-only inference works but is significantly slower.

Next Steps

  • Use the pickle output for gesture recognition, biomechanics analysis, or downstream machine learning.
  • Tune detection settings via hand_inference.py flags (score thresholds, max hands) and triangulation heuristics with hand_triangulation.py (--max-hands-total, --reproj-rejection) to balance robustness and speed.
  • Extend the evaluator or diagnostics scripts to suit your application (e.g., export CSV summaries or integrate temporal filters).