Skip to content

[doc] feat: add effectiveness figures and example recipes to README#26

Merged
Luosuu merged 1 commit into
verl-project:mainfrom
Luosuu:doc/readme-results-and-recipes
May 13, 2026
Merged

[doc] feat: add effectiveness figures and example recipes to README#26
Luosuu merged 1 commit into
verl-project:mainfrom
Luosuu:doc/readme-results-and-recipes

Conversation

@Luosuu
Copy link
Copy Markdown
Collaborator

@Luosuu Luosuu commented May 13, 2026

What does this PR do?

Concise overview of the change. Reference related issues/PRs.

Checklist Before Starting

  • Search for relative PRs/issues and link here: ...
  • PR title follows [{modules}] {type}: {description} format (see check_pr_title.py for the full list of allowed modules and types)
    • Breaking changes: prepend [BREAKING] — e.g. [BREAKING][ops] feat: new batch-invariant matmul API

Test

Validation results (numeric checks, benchmark metrics) for changes not covered by CI.

API and Usage Example

Show API changes and usage examples if applicable.

Design & Code Changes

High-level design description and specific change list.

Checklist Before Submitting

  • Read the Contribute Guide
  • Applied pre-commit checks (pre-commit run --all-files)
  • Added/updated documentation
  • Added tests to CI workflow (or explained why not feasible)

@Luosuu Luosuu merged commit 67b5380 into verl-project:main May 13, 2026
2 checks passed
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request enhances the README.md by adding an 'Effectiveness' section with performance comparisons and an 'Example Recipes' section containing a comprehensive table of training scripts. The review feedback identified several areas for improvement, including ensuring model naming consistency, adding a required environment variable to an example command, and correcting an algorithm label in the recipe table to align with the actual script implementation.

Comment thread README.md

## Effectiveness

> **Qwen3-30B-A3B · REINFORCE++ · DAPO dataset**
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For consistency with the recipe table below (line 53), the model name should include the -Base suffix when referring to the REINFORCE++ experiment.

Suggested change
> **Qwen3-30B-A3B · REINFORCE++ · DAPO dataset**
> **Qwen3-30B-A3B-Base · REINFORCE++ · DAPO dataset**

Comment thread README.md
```bash
bash examples/getting_started/run_qwen3_1b7.sh
# override paths via env vars
model_dir=/path/to/model data_dir=/path/to/data bash examples/moe/run_qwen3_30B_A3B_dapo.sh
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The example command is missing the test_path environment variable. The script examples/moe/run_qwen3_30B_A3B_dapo.sh explicitly requires test_path to locate the validation dataset (see line 36 of that script). Without it, the command will fail for users who do not have the default Arnold-style mount.

Suggested change
model_dir=/path/to/model data_dir=/path/to/data bash examples/moe/run_qwen3_30B_A3B_dapo.sh
model_dir=/path/to/model data_dir=/path/to/data test_path=/path/to/test bash examples/moe/run_qwen3_30B_A3B_dapo.sh

Comment thread README.md
| Recipe | Model | Dataset | Hardware | Algorithm |
|---|---|---|---|---|
| [`getting_started/run_qwen3_1b7.sh`](examples/getting_started/run_qwen3_1b7.sh) | Qwen3-1.7B | gsm8k | 1×8H100 | GRPO |
| [`moe/run_qwen3_30B_A3B_dapo.sh`](examples/moe/run_qwen3_30B_A3B_dapo.sh) | Qwen3-30B-A3B | DAPO-Math-17k / AIME 2025 | 1×8H100 | DAPO |
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The algorithm for this recipe is listed as DAPO, but the corresponding script examples/moe/run_qwen3_30B_A3B_dapo.sh sets algorithm.adv_estimator=grpo (line 34). Please ensure the algorithm name in the table accurately reflects the implementation in the script.

Suggested change
| [`moe/run_qwen3_30B_A3B_dapo.sh`](examples/moe/run_qwen3_30B_A3B_dapo.sh) | Qwen3-30B-A3B | DAPO-Math-17k / AIME 2025 | 1×8H100 | DAPO |
| [`moe/run_qwen3_30B_A3B_dapo.sh`](examples/moe/run_qwen3_30B_A3B_dapo.sh) | Qwen3-30B-A3B | DAPO-Math-17k / AIME 2025 | 1×8H100 | GRPO |

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant