feat:enable v2 training pipeline with controller parity by sitabulaixizawaluduo · Pull Request #1363 · areal-project/AReaL

sitabulaixizawaluduo · 2026-05-25T06:59:54Z

Changes

GatewayTrainController: add version management (set_version / get_version), connect_engine, clear_batches; persist guard addresses for later port allocation; unify HTTP client session (follow-up to feat: controller v2 refactor #1354).
Rollout entrypoints (rl_trainer, sglang_remote, vllm_remote): route to RolloutControllerV2 when config._version == "v2".
Weight update controller: small adjustments to the connect method for the v2 path.
Examples: add an agent config section to all examples/math/*.yaml; switch the default workflow in gsm8k_rl.py to MathAgent.
Reward: remove obsolete get_custom_reward_fn from areal/reward/__init__.py.

gemini-code-assist

Code Review

This pull request introduces 'v2' controller support, integrating RolloutControllerV2 and implementing RL parity methods like connect_engine and update_weights in the GatewayTrainController. It also refactors weight update logic, removes legacy reward utilities, and updates example configurations and tests. Feedback identifies a critical AttributeError in connect_engine due to an undefined attribute and notes that update_weights incorrectly invokes asynchronous methods synchronously while containing redundant generation control logic that may conflict with the trainer's state.

gemini-code-assist · 2026-05-25T07:03:25Z

+        )
+        ctrl.initialize()
+
+        inference_urls: list[str] = rollout.inference_worker_urls


The attribute inference_worker_urls is not defined in RolloutControllerV2, which will cause an AttributeError at runtime. You should use the internal _inf_addrs attribute or add a public property to RolloutControllerV2 to expose these URLs.

Suggested change

inference_urls: list[str] = rollout.inference_worker_urls

inference_urls: list[str] = rollout._inf_addrs

gemini-code-assist · 2026-05-25T07:03:25Z

+        self.rollout.pause_generation()
+        assert meta.version is not None and meta.version > 0, (
+            f"meta.version must be a positive integer, got {meta.version}"
+        )
+        result = self._weight_update_ctrl.update_weights(version=meta.version)
+        self.rollout.continue_generation()


This block has multiple issues:

The calls to pause_generation() and continue_generation() are asynchronous but called synchronously, returning coroutines without executing them.

These calls are redundant and potentially harmful here. PPOTrainer.train already manages the rollout pause/resume state. In RolloutControllerV2, pause() correctly stops generation. Resuming it here via continue_generation() would break the trainer's expectation that inference remains paused during the subsequent save and evaluation steps.

The assert should be replaced with a proper runtime check as assertions can be disabled in production.

if meta.version is None or meta.version <= 0: raise ValueError(f"meta.version must be a positive integer, got {meta.version}") result = self._weight_update_ctrl.update_weights(version=meta.version)

Bring GatewayTrainController and RolloutControllerV2 to full parity with v1 controllers for RL training paths. Key changes: - Route to RolloutControllerV2 when config._version=="v2" - Add version management, connect_engine, clear_batches to GatewayTrainController - Unify HTTP client session in GatewayTrainController (follows PR #1354) - Switch default workflow to MathAgent in example configs - Add agent config section to all example YAML files - Remove obsolete get_custom_reward_fn from reward module

TaoZex

LGTM

fishcrap

LGTM

sitabulaixizawaluduo requested review from CormickKneey, HwVanICI, PrometheusComing, garrett4wade and rchardx as code owners May 25, 2026 06:59

sitabulaixizawaluduo requested a review from garrett4wade May 25, 2026 06:59

sitabulaixizawaluduo requested review from TaoZex and fishcrap as code owners May 25, 2026 06:59

sitabulaixizawaluduo requested review from guozhihao-224 and nuzant May 25, 2026 06:59

sitabulaixizawaluduo requested a review from nuzant as a code owner May 25, 2026 06:59

sitabulaixizawaluduo requested a review from TaoZex May 25, 2026 06:59

sitabulaixizawaluduo requested a review from geshi001 as a code owner May 25, 2026 06:59

sitabulaixizawaluduo requested review from HwVanICI, fishcrap and rchardx May 25, 2026 06:59

gemini-code-assist Bot reviewed May 25, 2026

View reviewed changes

sitabulaixizawaluduo force-pushed the feat/training-controller-v2-parity branch from ce21c49 to 2e4dffd Compare May 25, 2026 07:19

sitabulaixizawaluduo marked this pull request as draft May 25, 2026 07:20

sitabulaixizawaluduo marked this pull request as ready for review May 26, 2026 06:21

sitabulaixizawaluduo added 2 commits May 26, 2026 14:21

fix: update wu controller connect method

2541bbd

sitabulaixizawaluduo force-pushed the feat/training-controller-v2-parity branch from 2e4dffd to 2541bbd Compare May 26, 2026 06:21

sitabulaixizawaluduo added the safe-to-test Ready to run unit-tests in a PR. label May 26, 2026

sitabulaixizawaluduo had a problem deploying to AReaL-unittests May 26, 2026 06:46 — with GitHub Actions Failure

chore: unblock CI for grpo and grpo_lora with admin key + lora name

eca4c31

sitabulaixizawaluduo added safe-to-test Ready to run unit-tests in a PR. and removed safe-to-test Ready to run unit-tests in a PR. labels May 26, 2026

sitabulaixizawaluduo changed the title ~~feat: enable v2 training pipeline with controller parity~~ feat:enable v2 training pipeline with controller parity May 26, 2026

sitabulaixizawaluduo closed this May 26, 2026

sitabulaixizawaluduo reopened this May 26, 2026

sitabulaixizawaluduo added safe-to-test Ready to run unit-tests in a PR. and removed safe-to-test Ready to run unit-tests in a PR. labels May 26, 2026

sitabulaixizawaluduo had a problem deploying to AReaL-unittests May 26, 2026 16:22 — with GitHub Actions Failure

chore: unblock CI for v2 parity

d7a834c

sitabulaixizawaluduo added safe-to-test Ready to run unit-tests in a PR. and removed safe-to-test Ready to run unit-tests in a PR. labels May 27, 2026

sitabulaixizawaluduo deployed to AReaL-unittests May 27, 2026 07:28 — with GitHub Actions Active

sitabulaixizawaluduo temporarily deployed to AReaL-unittests May 27, 2026 07:28 — with GitHub Actions Inactive

TaoZex approved these changes May 29, 2026

View reviewed changes

fishcrap approved these changes May 29, 2026

View reviewed changes

sitabulaixizawaluduo merged commit 0cfcd04 into main May 29, 2026
13 checks passed

sitabulaixizawaluduo deleted the feat/training-controller-v2-parity branch May 29, 2026 05:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat:enable v2 training pipeline with controller parity#1363

feat:enable v2 training pipeline with controller parity#1363
sitabulaixizawaluduo merged 4 commits into
mainfrom
feat/training-controller-v2-parity

sitabulaixizawaluduo commented May 25, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot May 25, 2026

Uh oh!

gemini-code-assist Bot May 25, 2026

Uh oh!

TaoZex left a comment

Uh oh!

fishcrap left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	inference_urls: list[str] = rollout.inference_worker_urls
	inference_urls: list[str] = rollout._inf_addrs

Conversation

sitabulaixizawaluduo commented May 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 25, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 25, 2026

Choose a reason for hiding this comment

Uh oh!

TaoZex left a comment

Choose a reason for hiding this comment

Uh oh!

fishcrap left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

sitabulaixizawaluduo commented May 25, 2026 •

edited

Loading