This project provides a simple tutorial for setting up, customizing, training, and visualizing a musculoskeletal finger model using MyoSuite, Gymnasium, and Stable-Baselines3.
Click the badge above to open the tutorial notebook directly in Google Colab. The notebook will automatically clone this repository and install all dependencies. Just run the cells sequentially!
Colab_Tutorial.ipynb: An interactive Jupyter Notebook to run the entire workflow in Google Colab.reward_tutorial.py: Defines a custom environment class (CustomRewardPoseEnv) that inherits from MyoSuite'sPoseEnvV0. It demonstrates how to override the reward function and inject custom rewards (e.g., an "efficiency" reward).train.py: Trains a Proximal Policy Optimization (PPO) agent using Stable-Baselines3 on the custom environment defined inreward_tutorial.py.visualize.py: Loads the trained PPO model, runs it in the custom environment, and renders the episode. The rendered frames are saved as an MP4 video in thevideo/directory.
This project uses uv as its package manager for extremely fast dependency resolution and environment isolation.
- Install
uvif you haven't already:
pip install uv- Sync the environment and install dependencies defined in
pyproject.toml:
uv syncAlternatively, you can manually run a script using uv to handle the environment automatically:
uv run python train.pyThe setup is designed to allow you to easily modify what the agent "cares about" and immediately see the results.
Open reward_tutorial.py. Inside the __init__ method, you can adjust the default_weights or modify the custom_reward_weights passed in train.py. You can also inject entirely new reward logic inside the get_reward_dict function.
Run the training script to train a new agent based on your customized reward logic:
uv run python train.pyThis will train the agent for 100,000 timesteps across 10 parallel environments and save the model to models/ppo_myofinger_custom_reward.zip.
Once training is complete, run the visualization script to watch the agent's behavior:
uv run python visualize.pyThis will run the trained policy and render an off-screen video. The resulting video will be saved as video/episode_custom_reward.mp4. Open this file to see how your reward changes affected the finger's movement!
MyoSuite handles rendering through the MuJoCo physics engine natively (env.mj_renderer.render_offscreen). The visualize.py script automatically manages fetching these frames and compiling them into a video using imageio.