Skip to content

Move RL code from src/MaxText/rl/ to src/maxtext/trainers/post_train/rl/#3180

Open
A9isha wants to merge 1 commit intomainfrom
anisha-rl-refactor
Open

Move RL code from src/MaxText/rl/ to src/maxtext/trainers/post_train/rl/#3180
A9isha wants to merge 1 commit intomainfrom
anisha-rl-refactor

Conversation

@A9isha
Copy link
Collaborator

@A9isha A9isha commented Feb 18, 2026

Description

Migrate RL training code to the new package structure following the same pattern as the SFT (PR #2988) and distillation moves. Old location files are replaced with backward-compatibility shims that delegate to the new modules with deprecation warnings.

Tests

Locally ran RL using the following commands:

## new
python3 -m src.maxtext.trainers.post_train.rl.train_rl src/maxtext/configs/post_train/rl.yml   model_name=llama3.1-8b   t
okenizer_path=meta-llama/Llama-3.1-8B-Instruct   load_parameters_path=/path/to/checkpoint   run_name=maz-8b-$RANDOM   bas
e_output_directory=/path/to/storage   hf_access_token=<HF_TOKEN> dataset_name=gsm8k steps=4


## old
python3 -m src.MaxText.rl.train_rl src/maxtext/configs/post_train/rl.yml   model_name=llama3.1-8b   t
okenizer_path=meta-llama/Llama-3.1-8B-Instruct   load_parameters_path=/path/to/checkpoint   run_name=maz-8b-$RANDOM   bas
e_output_directory=/path/to/storage   hf_access_token=<HF_TOKEN> dataset_name=gsm8k steps=4

Checklist

Before submitting this PR, please make sure (put X in square brackets):

  • I have performed a self-review of my code. For an optional AI review, add the gemini-review label.
  • I have necessary comments in my code, particularly in hard-to-understand areas.
  • I have run end-to-end tests tests and provided workload links above if applicable.
  • I have made or will make corresponding changes to the doc if needed, including adding new documentation pages to the relevant Table of Contents (toctree directive) as explained in our documentation.

@codecov
Copy link

codecov bot commented Feb 18, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

Copy link
Collaborator

@bvandermoon bvandermoon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the manual tests you ran, could you also try with the old commands? Not as critical as the train.py shims since RL is a newer feature, but still good to test them since they are added here

@A9isha
Copy link
Collaborator Author

A9isha commented Feb 18, 2026

Done testing with the old command and updated the description - thanks @bvandermoon

@A9isha A9isha force-pushed the anisha-rl-refactor branch from ec30219 to 81e5072 Compare February 18, 2026 22:04
@A9isha A9isha force-pushed the anisha-rl-refactor branch from 81e5072 to 8918852 Compare February 18, 2026 22:38
@bvandermoon
Copy link
Collaborator

Done testing with the old command and updated the description - thanks @bvandermoon

Thanks @A9isha. Just to double check, can you confirm you saw all logs as expected with the old command? For train.py, I needed to set logging.set_verbosity(logging.INFO) to see the standard completed step output logged

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants

Comments