feature(sunjx): implement dynamic sampling strategy in DAPO by Jiaxuan-Sun · Pull Request #40 · opendilab/LightRFT

Jiaxuan-Sun · 2026-02-10T03:22:18Z

Implement Dynamic Sampling (DAPO) for GRPO Training

This PR implements the dynamic sampling strategy from DAPO (Decoupled Clip and Dynamic Sampling Policy Optimization) to improve GRPO training efficiency.

Key Features

Group filtering: Filters out prompt groups where all responses have the same metric value (all correct or all incorrect), as they provide no useful gradient information for relative policy optimization

PaParaZz1 · 2026-03-06T06:30:50Z

lightrft/trainer/advantage_calculator.py

+                    if "accuracy" in exp.info:
+                        metric_values = exp.info["accuracy"]
+                    else:
+                        # Fallback: treat reward as binary accuracy


raise RuntimeError here, not use fallback

PaParaZz1 · 2026-03-06T06:39:24Z

lightrft/trainer/ppo_trainer_vl.py

+                            f"Warning: Dynamic sampling needs more batches, but current implementation "
+                            f"processes one batch at a time. Proceeding with {num_valid_prompts} valid prompts."
+                        )
+                    break


If you add break here, this while loop is not necessary, maybe you can omit it. Maybe you should set continue in the for loop. Only in the case that num_valid_prompts >= target_num_prompts, the following code should be executed

I think there are bugs in you previous experiment, the current implementation can not accumulate enough data for training when dynamic_sampling is enabled.

PaParaZz1 · 2026-03-06T06:39:58Z

lightrft/trainer/ppo_trainer_vl.py

+                            output = self.tokenizer.batch_decode(
+                                experience.sequences[0].unsqueeze(0), skip_special_tokens=True
+                            )
+                            self.strategy.print("collect phase: experience.sequences w skip_special_tokens: ", output)


remove these two prints here

PaParaZz1 · 2026-03-06T06:40:17Z

lightrft/trainer/ppo_trainer_vl.py

@@ -369,30 +369,85 @@ def fit(
                    f"rand_prompts:\n {rand_prompts}\n , rand_images:{rand_images}\n , rand_references:{rand_references}\n, rand_labels:{rand_labels}\n "  # noqa


you should also add the similar implementation in ppo_trainer.py

puyuan1996 · 2026-03-11T14:16:18Z

We have a new PR: #51

Jiaxuan-Sun added 3 commits February 9, 2026 16:32

feature(sunjx): dynamic sampling

2d3651a

feature(sunjx): fix bugs

36ef5c1

feature(sunjx): pass code check

67dbd0d

puyuan1996 changed the title ~~Feature(sunjx): Implement Dynamic Sampling (DAPO) for GRPO Training~~ feature(sunjx): implement dynamic sampling strategy in DAPO Feb 10, 2026

PaParaZz1 requested changes Mar 6, 2026

View reviewed changes

puyuan1996 closed this Mar 11, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feature(sunjx): implement dynamic sampling strategy in DAPO#40

feature(sunjx): implement dynamic sampling strategy in DAPO#40
Jiaxuan-Sun wants to merge 3 commits intoopendilab:mainfrom
Jiaxuan-Sun:feature/dynamic-sampling

Jiaxuan-Sun commented Feb 10, 2026

Uh oh!

PaParaZz1 Mar 6, 2026

Uh oh!

PaParaZz1 Mar 6, 2026

Uh oh!

PaParaZz1 Mar 6, 2026

Uh oh!

PaParaZz1 Mar 6, 2026

Uh oh!

PaParaZz1 Mar 6, 2026

Uh oh!

puyuan1996 commented Mar 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		@@ -369,30 +369,85 @@ def fit(
		f"rand_prompts:\n {rand_prompts}\n , rand_images:{rand_images}\n , rand_references:{rand_references}\n, rand_labels:{rand_labels}\n " # noqa

Conversation

Jiaxuan-Sun commented Feb 10, 2026

Implement Dynamic Sampling (DAPO) for GRPO Training

Key Features

Uh oh!

PaParaZz1 Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

PaParaZz1 Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

PaParaZz1 Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

PaParaZz1 Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

PaParaZz1 Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

puyuan1996 commented Mar 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants