Skip to content

feat:enable v2 training pipeline with controller parity#1363

Merged
sitabulaixizawaluduo merged 4 commits into
mainfrom
feat/training-controller-v2-parity
May 29, 2026
Merged

feat:enable v2 training pipeline with controller parity#1363
sitabulaixizawaluduo merged 4 commits into
mainfrom
feat/training-controller-v2-parity

Conversation

@sitabulaixizawaluduo
Copy link
Copy Markdown
Collaborator

@sitabulaixizawaluduo sitabulaixizawaluduo commented May 25, 2026

Changes

  • GatewayTrainController: add version management (set_version / get_version), connect_engine, clear_batches; persist guard addresses for later port allocation; unify HTTP client session (follow-up to feat: controller v2 refactor #1354).
  • Rollout entrypoints (rl_trainer, sglang_remote, vllm_remote): route to RolloutControllerV2 when config._version == "v2".
  • Weight update controller: small adjustments to the connect method for the v2 path.
  • Examples: add an agent config section to all examples/math/*.yaml; switch the default workflow in gsm8k_rl.py to MathAgent.
  • Reward: remove obsolete get_custom_reward_fn from areal/reward/__init__.py.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces 'v2' controller support, integrating RolloutControllerV2 and implementing RL parity methods like connect_engine and update_weights in the GatewayTrainController. It also refactors weight update logic, removes legacy reward utilities, and updates example configurations and tests. Feedback identifies a critical AttributeError in connect_engine due to an undefined attribute and notes that update_weights incorrectly invokes asynchronous methods synchronously while containing redundant generation control logic that may conflict with the trainer's state.

)
ctrl.initialize()

inference_urls: list[str] = rollout.inference_worker_urls
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The attribute inference_worker_urls is not defined in RolloutControllerV2, which will cause an AttributeError at runtime. You should use the internal _inf_addrs attribute or add a public property to RolloutControllerV2 to expose these URLs.

Suggested change
inference_urls: list[str] = rollout.inference_worker_urls
inference_urls: list[str] = rollout._inf_addrs

Comment on lines +1008 to +1013
self.rollout.pause_generation()
assert meta.version is not None and meta.version > 0, (
f"meta.version must be a positive integer, got {meta.version}"
)
result = self._weight_update_ctrl.update_weights(version=meta.version)
self.rollout.continue_generation()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This block has multiple issues:

  1. The calls to pause_generation() and continue_generation() are asynchronous but called synchronously, returning coroutines without executing them.
  2. These calls are redundant and potentially harmful here. PPOTrainer.train already manages the rollout pause/resume state. In RolloutControllerV2, pause() correctly stops generation. Resuming it here via continue_generation() would break the trainer's expectation that inference remains paused during the subsequent save and evaluation steps.
  3. The assert should be replaced with a proper runtime check as assertions can be disabled in production.
        if meta.version is None or meta.version <= 0:
            raise ValueError(f"meta.version must be a positive integer, got {meta.version}")
        result = self._weight_update_ctrl.update_weights(version=meta.version)

@sitabulaixizawaluduo sitabulaixizawaluduo force-pushed the feat/training-controller-v2-parity branch from ce21c49 to 2e4dffd Compare May 25, 2026 07:19
@sitabulaixizawaluduo sitabulaixizawaluduo marked this pull request as draft May 25, 2026 07:20
@sitabulaixizawaluduo sitabulaixizawaluduo marked this pull request as ready for review May 26, 2026 06:21
Bring GatewayTrainController and RolloutControllerV2 to full
parity with v1 controllers for RL training paths.

Key changes:
- Route to RolloutControllerV2 when config._version=="v2"
- Add version management, connect_engine, clear_batches to GatewayTrainController
- Unify HTTP client session in GatewayTrainController (follows PR #1354)
- Switch default workflow to MathAgent in example configs
- Add agent config section to all example YAML files
- Remove obsolete get_custom_reward_fn from reward module
@sitabulaixizawaluduo sitabulaixizawaluduo force-pushed the feat/training-controller-v2-parity branch from 2e4dffd to 2541bbd Compare May 26, 2026 06:21
@sitabulaixizawaluduo sitabulaixizawaluduo added the safe-to-test Ready to run unit-tests in a PR. label May 26, 2026
@sitabulaixizawaluduo sitabulaixizawaluduo added safe-to-test Ready to run unit-tests in a PR. and removed safe-to-test Ready to run unit-tests in a PR. labels May 26, 2026
@sitabulaixizawaluduo sitabulaixizawaluduo changed the title feat: enable v2 training pipeline with controller parity feat:enable v2 training pipeline with controller parity May 26, 2026
@sitabulaixizawaluduo sitabulaixizawaluduo added safe-to-test Ready to run unit-tests in a PR. and removed safe-to-test Ready to run unit-tests in a PR. labels May 26, 2026
@sitabulaixizawaluduo sitabulaixizawaluduo added safe-to-test Ready to run unit-tests in a PR. and removed safe-to-test Ready to run unit-tests in a PR. labels May 27, 2026
@sitabulaixizawaluduo sitabulaixizawaluduo deployed to AReaL-unittests May 27, 2026 07:28 — with GitHub Actions Active
Copy link
Copy Markdown
Collaborator

@TaoZex TaoZex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Copy Markdown
Collaborator

@fishcrap fishcrap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@sitabulaixizawaluduo sitabulaixizawaluduo merged commit 0cfcd04 into main May 29, 2026
13 checks passed
@sitabulaixizawaluduo sitabulaixizawaluduo deleted the feat/training-controller-v2-parity branch May 29, 2026 05:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

safe-to-test Ready to run unit-tests in a PR.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants