diff --git a/README.md b/README.md index 44ff1a4..a6dd7c8 100644 --- a/README.md +++ b/README.md @@ -11,6 +11,55 @@ Transformer-based bitwise-aligned rollout for VeOmni FSDP with VeRL integration. - 🧩 **Simple model definitions** — Transformer model code is self-contained and easy to audit, so training and inference model definitions stay in sync - 📖 **Readable codebase** — Clean implementation with chunked prefill, pipeline parallelism, and CUDA graph support +## Effectiveness + +> **Qwen3-30B-A3B · REINFORCE++ · DAPO dataset** + +Off-policy logprob bias from vLLM causes the rollout-correction KL to explode after ~300 steps, which triggers gradient norm blow-up and ultimately training collapse. VeXact's bitwise-aligned rollout keeps the KL at exactly zero throughout, yielding stable training and a ~2× higher final AIME 2024 score. + +
| Training reward | +AIME 2024 (mean@32) | +
![]() |
+ ![]() |
+
| Rollout-correction K3 KL (log scale) | +Gradient norm (log scale) | +
![]() |
+ ![]() |
+