Hand-tracking based control system for the Faze4 robotic arm simulator in Unity. The robot end-effector follows your hand position in real time using MediaPipe, PyBullet inverse kinematics, and a live TCP camera stream to Unity.
This project extends the open-source Faze4 Unity simulator by kholodilinivan/Faze4-Robotic-arm-simulator with:
- Real-time hand tracking via webcam (MediaPipe)
- PyBullet IK solving for the Faze4 URDF
- TCP robot control stream to Unity (port 55001)
- TCP live camera feed stream to Unity (port 55002)
- Grab detection (open/closed hand)
- Z-axis depth control via wrist-to-MCP landmark distance
- (v7) Exponential Moving Average (EMA) smoothing for jitter-free movement
Problem: Raw coordinates from MediaPipe hand tracking contain micro-fluctuations (jitter), causing the robot to shake even when the user holds their hand still. Additionally, the default vertical workspace was too high for comfortable desk use.
Fix in v7: - An Exponential Moving Average (EMA) filter is now applied to the target coordinates before they are sent to the IK solver. This averages out the micro-movements, resulting in smooth, stable, and fluid robot motion. The smoothing intensity can be tuned using the alpha variable (default 0.15).
WORKSPACE_Zis lowered to-0.270to shift the physical rest point of the robot downwards, making it more comfortable to control from a standard desk seated position without altering the actual movement scale.
Problem: The camera was calibrated with a high-resolution (2270×1514) chessboard image, but the live stream operates at a lower resolution (640×480). This mismatch caused the MM_PER_PX scaling factor to be approximately 3.5 times too small, which broke the Z-axis depth control.
Fix in v6: All calibration matrices and parameters have been mathematically scaled to match the 640×480 live resolution.
MM_PER_PXis now1.449 mm/px(the metrically correct value for the stream resolution).MOTION_SCALEis now0.000423(recalculated to preserve the same robot motion range).HAND_WRIST_TO_MCP_MM(e.g.,80.0 mm) now correctly corresponds to real-world hand size for accurate depth tracking.
Faze4-Robotic-arm-simulator/
├── UnityProject/ ← Faze4 Unity simulator (original)
│ └── Assets/Scripts/
│ └── CameraStreamReceiver.cs ← added: displays Python camera feed
├── faze.urdf ← robot model for PyBullet IK
├── calibration.ipynb ← camera calibration & mm/px scaling
├── v1_basic_xy_tracking.py
├── v2_grab_detection.py
├── v3_unity_camera_stream.py
├── v4_axis_correction_autoreconnect.py
├── v5_z_depth_control.py
├── v6_resolution_fixed.py
├── v7_smooth_control.py ← latest version
├── CameraStreamReceiver.cs ← Same script as in UnitProject folder for convenience
└── README.md
pip install opencv-python mediapipe pybullet numpyUnity 2021+ with the Faze4 project open.
- Open the Unity project
- Assign
CameraStreamReceiver.csto a GameObject with aRawImagecomponent - Press Play in the editor
- Run the latest Python script:
python v7_smooth_control.pyThe Python script and Unity will connect automatically — startup order does not matter. Both TCP channels (robot control + camera stream) retry until the other side is ready.
| Gesture | Action |
|---|---|
| Open hand, move left/right | Robot Y-axis |
| Open hand, move up/down | Robot X-axis |
| Move hand toward/away from camera | Robot Z-axis (depth) |
| Close fist (< 2 fingers up) | Gripper close |
| Open hand | Gripper open |
| Parameter | Default | Description |
|---|---|---|
alpha (in IK loop) |
0.15 |
Smoothing factor. Lower = smoother but slight delay, Higher = more responsive but more jitter. |
MOTION_SCALE |
0.000423 |
Amplifies XY hand motion to fit the robot's workspace for comfortable desk use. |
Z_SCALE |
0.010 |
Amplifies Z-axis (depth) hand motion for responsive control without large arm movements. |
HAND_WRIST_TO_MCP_MM |
80.0 |
Your physical wrist-to-knuckle distance (mm). Critical for Z-axis depth calibration. |
WORKSPACE_X_OFFSET |
0.35 |
Robot rest position X (meters) |
WORKSPACE_Y_OFFSET |
-0.25 |
Robot rest position Y (meters) |
WORKSPACE_Z |
-0.270 |
Robot neutral height Z (meters) - optimized for desk use. |
MIRROR_CAMERA |
True |
Mirror feed so hand left = screen left |
JPEG_QUALITY |
60 |
Camera stream quality to Unity |
The system translates hand movements into robot commands through a multi-step scaling process designed for responsive and ergonomic desktop use.
-
Pixel to Millimeter Conversion: The hand's wrist position in pixels
$(x_{px}, y_{px})$ is first converted to physical millimeters in the camera's 2D plane using the correctedMM_PER_PXfactor (1.449 mm/px). This provides a metric measurement of hand movement at the calibration distance. -
XY-Axis Workspace Mapping: The millimeter coordinates
$(x_{mm}, y_{mm})$ are mapped to the robot's XY workspace. Instead of a direct 1:1 metric conversion (which would use aMOTION_SCALEof0.001), an empirically chosenMOTION_SCALEof0.000423is used.-
Reasoning: A true 1:1 scale would require impractically large hand movements to cover the robot's full range from a desk. The smaller
MOTION_SCALEvalue amplifies hand motion, allowing the user to control the entire robot workspace with comfortable, small movements.
-
Reasoning: A true 1:1 scale would require impractically large hand movements to cover the robot's full range from a desk. The smaller
-
Z-Axis (Depth) Control: Depth is estimated by measuring the apparent distance between the user's wrist and index finger knuckle (
d_img) in the camera view. This is compared to a pre-measured reference distance (d_ref, e.g., 80mm). The difference is scaled byZ_SCALE.-
Reasoning: A
Z_SCALEof0.010is used instead of a 1:1 scale (0.001). This amplifies the small, natural changes in hand-to-camera distance into a full, usable Z-axis range for the robot, avoiding the need for large forward/backward arm movements.
-
Reasoning: A
-
Inverse Kinematics (IK): The final smoothed target position vector
[X, Y, Z]is fed into the PyBullet IK solver, which calculates the required joint angles for the Faze4 arm. These angles are then streamed to Unity.
The XY-plane tracking is only metrically 1:1 accurate when your hand is at the specific distance from the camera used during calibration (~1 meter). This is a natural consequence of camera perspective:
- Further away (>1m): A 10cm hand movement covers fewer pixels, resulting in a smaller robot movement.
- Closer (<1m): The same hand movement covers more pixels and is amplified into a larger robot movement.
For true 1:1 spatial mapping across all depths, the MOTION_SCALE would need to be dynamically adjusted based on the estimated Z-distance. The current fixed-scaler approach prioritizes simplicity and intuitive control feel over absolute metric accuracy.
| Script | Added feature |
|---|---|
v1_basic_xy_tracking.py |
XY hand tracking → robot, cv2.imshow window |
v2_grab_detection.py |
Grab detection via finger landmark comparison |
v3_unity_camera_stream.py |
Live camera feed streamed to Unity via TCP |
v4_axis_correction_autoreconnect.py |
Y-axis sign fix, mirroring, auto-reconnect on both TCP channels |
v5_z_depth_control.py |
Z-axis depth via wrist-MCP distance, personal hand calibration |
v6_resolution_fixed.py |
Math correctly scaled to 640x480 resolution. Fixed Z-axis tracking. |
v7_smooth_control.py |
EMA smoothing (anti-jitter filter), ergonomic WORKSPACE_Z adjustment. |
Add this script to any GameObject in your scene. Assign a RawImage UI element
to the displayImage field in the Inspector. The script connects to the Python
camera server on port 55002 and auto-reconnects after disconnects.
Unity Faze4 robot simulator based on: kholodilinivan/Faze4-Robotic-arm-simulator
Python hand tracking, IK solving, and Unity streaming written independently on top of the simulator.