Skip to content

feature(sunjx): implement dynamic sampling strategy in DAPO#51

Open
Jiaxuan-Sun wants to merge 3 commits intoopendilab:mainfrom
Jiaxuan-Sun:feature/dynamic-sampling-0306
Open

feature(sunjx): implement dynamic sampling strategy in DAPO#51
Jiaxuan-Sun wants to merge 3 commits intoopendilab:mainfrom
Jiaxuan-Sun:feature/dynamic-sampling-0306

Conversation

@Jiaxuan-Sun
Copy link
Contributor

Implement complete Dynamic Sampling (DAPO) for GRPO Training

This PR implements the dynamic sampling strategy from DAPO (Decoupled Clip and Dynamic Sampling Policy Optimization) to improve GRPO training efficiency.

Key Features

  • Group filtering: Filters out prompt groups where all responses have the same metric value (all correct or all incorrect), as they provide no useful gradient information for relative policy optimization
image image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants