Skip to content

open-thought/tiny-grpo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Minimal GRPO implementation

Goal: Working toy implementation of llama-3.2-3b locally RL training with GRPO. Understanding the algorithm & hyper parameters. Just running everything locally on a single node.

Setup

  1. Create conda env
conda create --name grpo python=3.12 -y
conda activate grpo
  1. Install dependencies
pip install -r requirements.txt
pip install flash-attn --no-build-isolation
  1. Play with the source in train.py
python train.py

Inspiration

References

About

Minimal hackable GRPO implementation

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages