Train a Unitree Go2 quadruped robot to perform velocity-tracking locomotion using PPO in the Genesis GPU-parallel physics engine.
The policy learns to follow velocity commands [v_x, v_y, ω_z] while keeping the torso upright and the gait smooth. Training runs ~4096 environments in parallel on a single GPU and converges in roughly 5–10 minutes on an RTX 5060 Ti.
- Robot: Unitree Go2 (12 DoF: 4 legs × {hip, thigh, calf})
- Task: Velocity tracking — follow linear and angular velocity commands
- Engine: Genesis (GPU-accelerated, parallel envs)
- Algorithm: PPO via rsl_rl
- Observation: 45-dim (base ang vel, projected gravity, commands, joint pos/vel, last actions)
- Action: 12-dim joint position offsets (PD-controlled at 50 Hz)
- Linux (tested on Ubuntu 24.04 / kernel 6.17)
- NVIDIA GPU with CUDA support (tested on RTX 5060 Ti, 16 GB VRAM)
- Python 3.10+
- PyTorch 2.x with CUDA
# Clone the repo
git clone <your-repo-url>
cd StrikeRobot_genesis_development
# (Recommended) create a virtual environment
python -m venv .venv
source .venv/bin/activate
# Install dependencies
pip install -r requirements.txtrequirements.txt pulls in genesis-world, rsl-rl-lib, and tensorboard. PyTorch is expected to be installed separately to match your CUDA version.
Verify the install:
python -c "import genesis, rsl_rl, torch; print('OK, CUDA:', torch.cuda.is_available())"Headless training (fastest):
python scripts/train.py --exp_name go2_walk --max_iterations 500 --headlessCommon flags:
| Flag | Default | Description |
|---|---|---|
--exp_name |
go2_walk |
Subfolder under logs/ for this run |
--num_envs |
4096 |
Parallel environments (reduce if OOM) |
--max_iterations |
500 |
Number of PPO iterations |
--seed |
1 |
RNG seed |
--headless |
off | Disable viewer for max training speed |
In a second terminal:
tensorboard --logdir logs/Open http://localhost:6006. Watch for:
Train/mean_rewardtrending up (target:> 8.0)Train/mean_episode_lengthapproaching 1000 (target:> 800)
python scripts/eval.py --exp_name go2_walk --ckpt 500A Genesis viewer window opens and the trained Go2 walks following a velocity command.
StrikeRobot_genesis_development/
├── envs/
│ └── go2_env.py # Go2 environment (Genesis VecEnv)
├── configs/
│ └── go2_config.py # env / obs / reward / command / PPO configs
├── scripts/
│ ├── train.py # Training entry point
│ └── eval.py # Policy playback
├── docs/superpowers/specs/
│ └── 2026-05-12-go2-ppo-training-design.md # Design spec
├── logs/ # Checkpoints + Tensorboard (gitignored)
├── requirements.txt
└── README.md
Each PPO iteration:
- 4096 parallel Go2 environments step in lockstep on the GPU.
- The actor policy (MLP
45 → 512 → 256 → 128 → 12, ELU) outputs joint position offsets. - Genesis simulates physics at 200 Hz with
decimation=4(control at 50 Hz). - Rewards combine velocity tracking (positive) with penalties on bouncing, roll/pitch oscillation, body tilt, height deviation, action jerk, and joint-pose drift.
- rsl_rl's
OnPolicyRunnercollects 24 steps × 4096 envs ≈ 98k transitions and runs 5 epochs of PPO updates with adaptive KL learning-rate scheduling.
Full hyperparameters and reward weights are documented in the design spec.
- CUDA OOM: lower
--num_envs(e.g.2048). Training will be ~2× slower but still converges. import genesisfails: ensure you usedgenesis-world(not the unrelatedgenesisPyPI package). Reinstall withpip install --force-reinstall genesis-world.- Reward not increasing past iteration 100: check Tensorboard for
Loss/surrogateandPolicy/mean_noise_std. If noise collapsed early, lowerentropy_coefor increaseinit_noise_std. - Viewer is laggy in eval: that's expected with
num_envs=1and full rendering. Training itself should always be run with--headless.
- Genesis physics engine
- rsl_rl PPO implementation
- Legged Gym (reward design inspiration)
- Unitree Go2 robot
See LICENSE.