🦖 TEMPFLOW-GRPO: WHEN TIMING MATTERS FOR GRPO IN FLOW MODELS

TempFlow-GRPO (Temporal Flow GRPO), a principled GRPO framework that captures and exploits the temporal structure inherent in flow-based generation.

🗺️ Roadmap for TempFlow-GRPO

TempFlow-GRPO (Temporal Flow GRPO), a principled GRPO framework that captures and exploits the temporal structure inherent in flow-based generation. TempFlow-GRPO introduces two key innovations: (i) a trajectory branching mechanism that provides process rewards by concentrating stochasticity at designated branching points, enabling precise credit assignment without requiring specialized intermediate reward models; and (ii) a noise-aware weighting scheme that modulates policy optimization according to the intrinsic exploration potential of each timestep, prioritizing learning during high-impact early stages while ensuring stable refinement in later phases. These innovations endow the model with temporally-aware optimization that respects the underlying generative dynamics, leading to state-of-the-art performance in human preference alignment and standard text-to-image benchmark.

Welcome Ideas and Contributions. Stay tuned!

🆕 News

We have presented an improved Flow-GRPO method, TempFlow-GRPO. We will release our code recently!🔥🔥🔥

[2025-08-06] We have released the first version of our paper. 🔥🔥🔥
[2025-08-11] Thanks Jie Liu's comments for our paper. We will release the 1024 Flux RL model in the month. 🔥🔥🔥
[2025-08-14] Our method also achieves better performance in FLUX 1024px with HPSv3 (based on Qwen2-VL) as reward. 🔥🔥🔥
[2025-08-20] We have released the first version of our paper in huggface. 🔥🔥🔥
[2025-09-12] We will release the second version of our paper in next week. 🔥🔥🔥
[2025-09-17] We will release the code of our paper. 🔥🔥🔥

🚀 Updates

To support research and the open-source community, we will release the entire project—including datasets, training pipelines, and model weights. Thank you for your patience and continued support! 🌟

📕 Training & Evaluation

Preparation

First you need to download the reward model (we support clip-based pickscore, vlm-based hpsv3, ...) and base model (SD3.5-M, FLUX.1-dev).
Then you need to modify the noise level in sd3_pipeline_with_logprob_perstep and sd3_pipeline_with_logprob.
Finally, you need to modify the config. We suggest you using 24 groups and 48 num groups.

Note that we use branch=4, per branch exploration=6. You can modify them in our code. We will release a neat code verision in next few days.

Training

SD3.5-M

# Flow-GRPO
bash scripts/multi_node/main.sh
# TempFlow-GRPO
bash scripts/multi_node/train_sd3_pr.sh

FLUX.1-dev

# Flow-GRPO
bash scripts/multi_node/train_flux.sh
# TempFlow-GRPO
bash scripts/multi_node/train_flux_pr.sh

📊 Experimental Performance

📺 Visualization

For more details please read our paper.

Acknowledgements

Flow-GRPO: The first method integrating online reinforcement learning (RL) into flow matching models.

Name		Name	Last commit message	Last commit date
Latest commit History 54 Commits
asset		asset
config		config
dataset		dataset
flow_grpo		flow_grpo
scripts		scripts
Readme.md		Readme.md
paper.pdf		paper.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🦖 TEMPFLOW-GRPO: WHEN TIMING MATTERS FOR GRPO IN FLOW MODELS

🗺️ Roadmap for TempFlow-GRPO

🆕 News

🚀 Updates

📕 Training & Evaluation

Preparation

Training

SD3.5-M

FLUX.1-dev

📊 Experimental Performance

📺 Visualization

Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🦖 TEMPFLOW-GRPO: WHEN TIMING MATTERS FOR GRPO IN FLOW MODELS

🗺️ Roadmap for TempFlow-GRPO

🆕 News

🚀 Updates

📕 Training & Evaluation

Preparation

Training

SD3.5-M

FLUX.1-dev

📊 Experimental Performance

📺 Visualization

Acknowledgements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages