Trait-Aware Diffusion Trait Steering (TADSRL)

This repository is a research fork of DSRL that focuses on making a frozen diffusion policy steerable at test time by conditioning the noise policy on explicit traits. It keeps the diffusion policy fixed and learns a trait-aware noise policy with reward shaping.

Core idea:

r = base_reward + sum_i m_i * lambda_i * r_i(...)

where t is a vector of trait values and m is a binary mask that turns traits on/off.

Setup

Clone this repository

git clone --recurse-submodules <this-repo>
cd diffusion-trait-steering

Create a conda environment

conda create -n tadsrl python=3.9 -y
conda activate tadsrl

Install DPPO (diffusion policies)

cd dppo
pip install -e .
pip install -e .[gym]
cd ..

Install Stable Baselines3 (DSRL implementation)

cd stable-baselines3
pip install -e .
cd ..

Download diffusion policy checkpoints for DSRL from the original project and place them in ./dppo/log: https://drive.google.com/drive/folders/1kzC49RRFOE7aTnJh_7OvJ1K5XaDmtuh1

Trait-Aware Training (TADSRL)

Traits and schedules live in cfg/gym/dsrl_walker.yaml under traits. Trait reward functions live in traits.py.

Run Walker2d training:

python train_dsrl.py --config-path=cfg/gym --config-name=dsrl_walker.yaml

Define traits (Python)

Each trait reward is a Python function that receives raw (unnormalized) observations:

def thigh_gap(raw_obs, info):
    return reward, {"gap": gap}

Traits are registered by name in traits.py.

Base reward override (optional)

You can replace the environment reward per step:

traits:
  base_reward_fn: healthy_reward

Base reward functions also live in traits.py.

Phased mask training

Train traits incrementally with a mask schedule:

traits:
  schedule:
    min_steps: 250000
    patience: 3
    min_delta: 0.0
    phases:
      - mask: [1, 0]
      - mask: [0, 1]
      - mask: [1, 1]

Inference with Traits

Use run_inference.py to set arbitrary trait values and masks at test time:

python run_inference.py --config-path=cfg/gym --config-name=dsrl_walker.yaml \
  model_path=/abs/path/to/ft_policy_XXXX_steps.zip \
  trait_values=[0.6,1.2] trait_mask=[1,1] eval_episodes=5

To record videos:

python run_inference.py --config-path=cfg/gym --config-name=dsrl_walker.yaml \
  model_path=/abs/path/to/ft_policy_XXXX_steps.zip \
  record_video=true video_dir=videos_inference video_episodes=2

Trait-Aware Logging (W&B)

Logging is configured under traits.logging in cfg/gym/dsrl_walker.yaml. It includes:

Per-trait reward/value/mask statistics and shaping delta.
Action norm stats and correlation with traits.
Eval sweeps over trait values and cross-mask evals.
Auto-generated W&B plots (heatmap, elasticity, mask bar).

Notes

Trait values are sampled per episode. Mask phases control which traits are active.
speed_ref (for Walker2d speed trait) should be estimated from the frozen policy and set in traits.py.

Acknowledgements

This fork builds on DSRL, Stable Baselines3, and DPPO.

Citation (DSRL)

@article{wagenmaker2025steering,
  author    = {Wagenmaker, Andrew and Nakamoto, Mitsuhiko and Zhang, Yunchu and Park, Seohong and Yagoub, Waleed and Nagabandi, Anusha and Gupta, Abhishek and Levine, Sergey},
  title     = {Steering Your Diffusion Policy with Latent Space Reinforcement Learning},
  journal   = {Conference on Robot Learning (CoRL)},
  year      = {2025},
}

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
alternate10k		alternate10k
alternate50cap10k		alternate50cap10k
alternate50cap110k		alternate50cap110k
alternate50cap30k		alternate50cap30k
alternate50cap70k		alternate50cap70k
alternate8k		alternate8k
assets		assets
bigflightnocap		bigflightnocap
cfg		cfg
dppo @ 7286534		dppo @ 7286534
gallop		gallop
gallop500		gallop500
gallop501		gallop501
jan112026		jan112026
jan1120262		jan1120262
lastrun11thnight		lastrun11thnight
nomins12k		nomins12k
nomins2k		nomins2k
nomins4k		nomins4k
nomins6k		nomins6k
nomins8k		nomins8k
pelvis1		pelvis1
pelvis108k		pelvis108k
pelvis12k		pelvis12k
pelvis20k		pelvis20k
pelvis8k		pelvis8k
stable-baselines3 @ 8a312cd		stable-baselines3 @ 8a312cd
stridev3jan122026		stridev3jan122026
videogapfast		videogapfast
videos		videos
videos10000steps		videos10000steps
videos1000steps		videos1000steps
videos2		videos2
videos3		videos3
videos4		videos4
videos4000steps		videos4000steps
videos5		videos5
videos6		videos6
videos_inference		videos_inference
videosbaseline		videosbaseline
videosgapmax		videosgapmax
videosgapmax_speed		videosgapmax_speed
videostrainedfast		videostrainedfast
videostransfer		videostransfer
videoswalkertrainedgapsteps		videoswalkertrainedgapsteps
.gitignore		.gitignore
.gitmodules		.gitmodules
README.md		README.md
check_geoms.py		check_geoms.py
commands		commands
env_utils.py		env_utils.py
experiment_gap.py		experiment_gap.py
experiment_gap_results.png		experiment_gap_results.png
experiment_speed.py		experiment_speed.py
experiment_speed_results.png		experiment_speed_results.png
inspect_mujoco_obs.py		inspect_mujoco_obs.py
record_hopper_rollouts.py		record_hopper_rollouts.py
record_walker_baseline.py		record_walker_baseline.py
record_walker_checkpoint.py		record_walker_checkpoint.py
record_walker_gap_max.py		record_walker_gap_max.py
record_walker_rollouts.py		record_walker_rollouts.py
record_walker_stride_v3.py		record_walker_stride_v3.py
run_inference.py		run_inference.py
script.py		script.py
train_dsrl.py		train_dsrl.py
traits.py		traits.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Trait-Aware Diffusion Trait Steering (TADSRL)

Setup

Trait-Aware Training (TADSRL)

Define traits (Python)

Base reward override (optional)

Phased mask training

Inference with Traits

Trait-Aware Logging (W&B)

Notes

Acknowledgements

Citation (DSRL)

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Trait-Aware Diffusion Trait Steering (TADSRL)

Setup

Trait-Aware Training (TADSRL)

Define traits (Python)

Base reward override (optional)

Phased mask training

Inference with Traits

Trait-Aware Logging (W&B)

Notes

Acknowledgements

Citation (DSRL)

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages