Please go through the whole process on a Ubuntu system.
We've released the following tutorials for training and deploying a reinforcement learning policy. Please check it out on Bilibili or YouTube!
# segmentation debug tools install
sudo apt-get install libdw-dev
wget https://raw.githubusercontent.com/bombela/backward-cpp/master/backward.hpp
sudo mv backward.hpp /usr/include
# Dependency install (python3.10)
pip install pybullet "numpy < 2.0" mujoco
git clone --recurse-submodule https://github.com/DeepRoboticsLab/Lite3_rl_deploy.git
# compile
mkdir build && cd build
cmake .. -DBUILD_PLATFORM=x86 -DBUILD_SIM=ON -DSEND_REMOTE=OFF
# Explanation
# -DBUILD_PLATFORM:device platform,Ubuntu is x86,quadruped is arm
# -DBUILD_SIM:whether or not to use simulatior, if deployed on real robots, set to OFF
make -j# run (open 2 terminals)
# Terminal 1 (pybullet)
cd interface/robot/simulation
python pybullet_simulation.py
# Terminal 1 (mujoco)
cd interface/robot/simulation
python mujoco_simulation.py
# Terminal 2
cd build
./rl_deploytips:right click simulator window and select "always on top"
- z: default position
- c: rl control default position
- wasd:forward/leftward/backward/rightward
- qe:clockwise/counter clockwise
This process is almost identical to simulation-simulation. You only need to add the step of connecting to Wi-Fi to transfer data, and then modify the compilation instructions. Currently, the default real-machine control mode is Retroid controller mode. If you need to use keyboard mode, you can change state_machine/state_machine.hpp line121 to
uc_ptr_ = std::make_shared<KeyboardInterface>();modify this file jy_exe/conf/network.toml to this content:
ip = '192.168.2.1'
target_port = 43897
local_port = 43893
ips = ['192.168.1.103']
ports = [43897]# apply code_modification
# computer and gamepad should both connect to WiFi
# WiFi: Lite*******
# Passward: 12345678 (If wrong, contact technical support)
# scp to transfer files to quadruped (open a terminal on your local computer)
scp -r ~/Lite3_rl_deploy ysc@192.168.2.1:~/
# ssh connect for remote development
# Username Password
# ysc ' (a single quote)
ssh ysc@192.168.2.1
# enter your passward, the terminal will be active on the qurdruped computer
# compile
cd Lite3_rl_deploy
mkdir build && cd build
cmake .. -DBUILD_PLATFORM=arm -DBUILD_SIM=OFF -DSEND_REMOTE=OFF
# Explanation
# -DBUILD_PLATFORM:device platform,Ubuntu is x86,quadruped is arm
# -DBUILD_SIM:whether or not to use simulatior, if deployed on real robots, set to OFF
make -j
./rl_deployPlease refer to https://github.com/DeepRoboticsLab/gamepad
To run the policy file trained with RL, you need to link the onnxruntime library, which supports models in the .onnx format. Therefore, you must manually convert the .pt model to the .onnx format.
You can convert the .pt model to the .onnx model by running the pt2onnx.py file in the policy folder. Pay attention to the program output to compare the consistency between the two models.
First, configure and verify the program runtime environment:
pip install torch numpy onnx onnxruntime
python3 -c 'import torch, numpy, onnx, onnxruntime; print(" All modules OK")'Then, run the program:
cd your/path/to/LITE3_RL_DEPOLY/policy/
python pt2onnx.pyAfterward, you will see the corresponding .onnx model file in the current folder.
graph LR
A(Idle) -->B(StandUp) --> C(RL)
C-->D(JointDamping)
B-->D
D-->A
The state_machine module is where Lite3 switches between different states, the different states represent the following functions:
1.Idle : Idle state, indicating that the robot is in a situation where the joints do not enabled.
2.StandUp : Stand up state, indicating the action of the robot dog from sit to stand.
3.RL : RL control state,indicating the action output by the robot execution strategy.
4.JointDamping : Joint damping state, indicating that the joints of the robot are in the damping control state
graph LR
A(Interface) -->B(Robot)
A --> C(User Command)
B-->D(simulation)
B-->E(hardware)
C-->F(gamepad)
C-->G(keyboard)
The interface module represents the inputs for the dog's data receiving and sending interface and joystick control. Among them, the inputs of the robot platform are divided into simulation and physical, and the inputs of the controller are divided into keyboard and joystick control.
graph LR
A(policy_runner_base) -->B(policy_runner)
This section is used to execute the output of the RL policy, new policies can be implemented by inheriting policy_runner_base.
The policy runner is compiled as a separate static library (librun_policy.a). This means editing the policy only recompiles that one translation unit — the state machine and main.cpp object files are left untouched, making incremental rebuilds much faster.
The onnxruntime C++ API headers are included only inside run_policy/lite3_test_policy_runner_onnx.cpp (via a PIMPL pattern). Callers only see the thin lite3_test_policy_runner_onnx.h header, which has no onnxruntime dependency.
The default GNU linker (ld) is single-threaded. To speed up the link step, install mold and enable it with a CMake flag:
sudo apt install mold
# add -DUSE_MOLD_LINKER=ON to your cmake command, e.g.:
cmake .. -DBUILD_PLATFORM=x86 -DBUILD_SIM=ON -DUSE_MOLD_LINKER=ON
make -jmold is a parallel linker and is typically 5–10× faster than ld for this project.