rollout: enable MooncakeStoreConnector with hard-reset on weight update by aoshen02 · Pull Request #1 · aoshen02/verl

aoshen02 · 2026-05-11T11:42:09Z

Summary

Adds actor_rollout_ref.rollout.kv_store config (enable, kv_connector, kv_role, config_path, extra_config, on_failure) and wires the vLLM rollout engine launch with --kv-transfer-config so an external MooncakeStoreConnector can pool prefix KV across rollout replicas.
Propagates reset_connector=True on every prefix-cache reset path (wake_up, clear_kv_cache, abort_all_requests) when kv_store.enable is set, so the external Mooncake master is cleared via store.remove_all(force=True) on every weight update. RL-correct hard-reset path: external KV blocks computed against the previous model weights are dropped before any new rollout request can read them.
Adds docs/advance/rollout_kv_offload.md (linked from docs/index.rst under "Advanced Features") documenting when to use it, the full reset cascade, the config schema, env vars, operational notes, and a comparison vs SGLang's HiCacheStorage flow.

Why this matters

verl's existing update_weights flow already does the right thing for vLLM's in-engine prefix cache: abort_all_requests drains in-flight, sleep + NCCL sync the weights, then wake_up calls engine.reset_prefix_cache(). The guard in BlockPool.reset_prefix_cache wipes the in-memory hash table because abort drained everything.

What it does not do (yet) is reset the external Mooncake KV store. Without this PR, the next request's MooncakeStoreScheduler.get_num_new_matched_tokens() would happily query Mooncake, find a prefix that the previous-weight rollout wrote there, load it, and let the new weights attend to KV computed against the old policy — silent correctness loss.

The vLLM-side companion patch implements MooncakeStoreConnector.reset_cache() (paired PR; routes via the existing ZMQ admin channel to worker rank 0 -> store.remove_all(force=True)). This verl-side PR is the wire that triggers it.

Paired vLLM PR

Depends on MooncakeStoreConnector.reset_cache() landing in vLLM. Companion PR: ivanium/vllm#46 (against feat/mooncake-store-int -> upstream vllm-project/vllm).

Soft-dependency contract

KVStoreConfig.on_failure defaults to "fallback": if the Mooncake master is unreachable when the engine launches, training keeps running with the connector disabled (warning logged). Set to "crash" for stricter CI use.

Tests

Cascade integration verified inside the production rollout container with the paired vLLM patch applied: connector.reset_cache() -> ZMQ admin RPC -> store.remove_all(force=True) on a per-run Mooncake master. Master Admin Metrics confirmed DelAll=2/2 matching the smoke flow (PutStart:(Req=12/0/12, Item=12/12), ExistKey:(Req=3/0/3, Item=18/18)).
Standalone Mooncake remove_all(force=True) validation (put 8 keys -> exist -> remove_all -> exist=0 -> re-put -> exist=1) passed against ivanium/Mooncake yifan/dev build on aarch64 GB200.
Lookup-path regression smoke (5 cases incl. lookup-after-reset) verifies the new RESET_MAGIC discriminator did not break the existing ZMQ lookup wire format.
Static syntax-check on the modified verl files (py_compile).
RolloutConfig(name='vllm') -> cfg.kv_store.enable=False, .kv_connector="MooncakeStoreConnector", .get('enable', False)=False verified against the patched dataclass (importlib overlay test inside the container's verl env).

Test plan

Cascade evidence captured + paired vLLM unit tests (7/7 reset tests, 37/37 mooncake-store tests overall)
Mooncake remove_all standalone validation against the production master build
Lookup-path regression (no wire-format break from RESET_MAGIC addition)
verl config dataclass smoke (KVStoreConfig defaults, mapping .get() interface)
Full GRPO 5-cycle bench run with kv_store.enable=true — needs a free Ray cluster + ~1 hour bench time; the new doc page lists the expected acceptance criteria.

AI assistance

This PR was authored with the help of Claude Opus 4.7 (1M context). Every line was reviewed end-to-end. The config schema + flag-propagation strategy was chosen to be minimal (one new dataclass, three call-site reset_connector arguments, one CLI flag for vllm-serve, one doc page wired into the existing "Advanced Features" toctree).

Duplicate-work check

No existing verl PR adds MooncakeStoreConnector rollout support. The related PD-disaggregation work on feat/verl-vllm-pd-disagg (the existing PR verl-project#6243 branch) addresses a different connector (MooncakeConnector for PD transport, not MooncakeStoreConnector for offload pool) and a different lifecycle (per-request handshake vs per-update reset). The two could be combined in a MultiConnector config in a follow-up PR.

Adds `actor_rollout_ref.rollout.kv_store.{enable,kv_connector,kv_role, config_path,extra_config,on_failure}` configuration. When `enable=true`, the vLLM rollout engine is launched with `--kv-transfer-config` wiring MooncakeStoreConnector, and all prefix-cache reset paths (`wake_up`, `clear_kv_cache`, `abort_all_requests`) propagate `reset_connector=True` so the Mooncake master is cleared via `store.remove_all(force=True)` on every weight update. This is the RL-correct hard-reset path: external KV blocks computed against the previous model weights are dropped before any new rollout request can read them, matching the existing in-engine prefix-cache invalidation that verl already drives via `abort_all_requests` + `reset_prefix_cache`. `on_failure=fallback` (default) makes the connector a soft dependency: training keeps running with the Mooncake offload disabled if the master is unreachable at engine launch. A `recipe_aoshen/phase_b_mooncake.sh` fork of `phase_b.sh` shows the expected invocation (1 tray, Qwen3-0.6B GSM8K, 5 weight syncs); pair it with `start_master.sh` in `projects/mooncake-integration/scripts/` for a per-run master. Paired with the vLLM-side cascade hook PR (MooncakeStoreConnector. reset_cache routes through the existing ZMQ admin channel to worker rank 0 -> store.remove_all). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Drops the recipe_aoshen/phase_b_mooncake.sh example (sample-only, not upstream-relevant) and instead documents the new actor_rollout_ref.rollout.kv_store knob and the RL-correctness contract (hard-reset on every weight update via reset_connector=True) in a new advance/rollout_kv_offload.md page. The doc covers: - When to enable the feature (and when not to) - The hard-reset cascade ending in store.remove_all(force=True) so reviewers can trace it without re-reading the diff - Configuration reference for every kv_store.* field - Required env vars (MOONCAKE_CONFIG_PATH, optional PYTHONHASHSEED=0) - Operational notes (per-run master, reset cost, failure modes) - Comparison vs SGLang's opt-in flush flow Wired into docs/index.rst under "Advanced Features" next to rollout_skip / rollout_trace. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Two cleanups requested in PR review: 1. Use pause_generation's new reset_connector kwarg instead of an extra engine.reset_prefix_cache call afterwards. The paired vLLM patch now threads reset_connector all the way through pause_generation -> pause_scheduler_async -> EngineCore. pause_scheduler -> _reset_caches -> Scheduler.reset_prefix_cache, so the hard-reset is a single call: await self.engine.pause_generation( wait_for_inflight_requests=False, clear_cache=reset_prefix_cache, reset_connector=reset_prefix_cache and self._kv_store_enabled, ) No more "AsyncLLM.pause_generation does not propagate reset_connector" workaround. 2. Move docs/advance/rollout_kv_offload.md -> docs/perf/rollout_kv_offload.md. This is a performance / KV offload feature, fits better under "Performance Tuning Guide" next to perf/perf_tuning, perf/dpsk, etc. than under "Advanced Features". Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Previous commit ("simplify Mooncake hard-reset and move doc to perf/") only landed the file rename; the two payload edits accidentally got dropped from the staging round. This commit ships them: - verl/workers/rollout/vllm_rollout/vllm_async_server.py: replace the trailing `await self.engine.reset_prefix_cache(reset_connector=True)` workaround with a single `pause_generation(..., reset_connector=...)` call (rides on the paired vllm reset_connector kwarg now plumbed through pause_generation -> EngineCore._reset_caches). - docs/index.rst: move rollout_kv_offload from "Advanced Features" toctree to "Performance Tuning Guide" toctree, next to perf_tuning and dpsk where it belongs as a perf/KV-offload feature. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The flag served two roles: 1. Gate whether to attach kv_transfer_config to vllm serve args at launch time. 2. Gate whether to pass reset_connector=True on every cache reset. Role 2 is unnecessary now that the paired vLLM patch (scheduler: treat reset_connector with no connector as no-op success) makes Scheduler.reset_connector_cache return True without a warning when no connector is attached. The reset paths can simply ask for a connector reset unconditionally; the engine decides what to do. Role 1 stays as an inline check on `kv_store_cfg.get("enable", False)` in launch_server — no need to remember the result on the adapter. Result: three reset call sites (`wake_up`, `clear_kv_cache`, `abort_all_requests` via `pause_generation`) all pass `reset_connector=True` unconditionally. No instance flag, no conditional, less state to reason about. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…g passthrough Replace the verl-specific KVStoreConfig dataclass (enable/kv_connector/ kv_role/config_path/extra_config/on_failure) with the existing generic `actor_rollout_ref.rollout.engine_kwargs.vllm.<key>` passthrough. Users opting into the Mooncake-Store offload now set `engine_kwargs.vllm.kv_transfer_config` to the JSON string that vLLM's own --kv-transfer-config CLI flag already accepts (vLLM decodes it into its first-class KVTransferConfig). Net effect: - No verl-side schema to learn / document / migrate when vLLM adds new kv_transfer_config fields or new KV connectors (NixlConnector, P2pNcclConnector, MultiConnector, future ones — all work without a verl change). - ~40 lines of dataclass + 23 lines of "if enable: build dict" launch logic deleted; no behavior change for existing users (the field was default-disabled). - The on_failure="fallback" soft-dependency knob is dropped. Silent disable of a configured KV store is the wrong default for RL runs — hours of post-update stale-cache reads would be hidden. vLLM serve now fails loud if the Mooncake master is unreachable at engine launch; callers wanting soft mode can wrap the launch with a pre-flight healthcheck. Also drop the now-unsupported `reset_connector=` kwarg from the `AsyncLLM.pause_generation(...)` call in abort_all_requests — upstream vLLM does not (and per the paired cascade PR, does not need to) expose that kwarg. EngineCore._reset_caches defaults reset_connector=True so the connector cascade fires automatically whenever pause_generation runs with clear_cache=True. The wake_up / clear_kv_cache call sites keep the explicit reset_connector=True on reset_prefix_cache (the supported upstream entry point); both are no-op success when no connector is configured, so they remain safe to pass unconditionally. Paired vLLM PR: vllm-project/vllm#42694. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…onfig passthrough The verl-side KVStoreConfig dataclass is gone; the doc page no longer needs to document its fields. Replace the configuration section with the actual minimal recipe: set engine_kwargs.vllm.kv_transfer_config to a JSON string that vLLM's KVTransferConfig already accepts. Also: - Update the RL-correctness section to describe both reset entry points: explicit reset_prefix_cache(reset_connector=True) from wake_up / clear_kv_cache, and the automatic cascade through EngineCore._reset_caches when abort_all_requests goes through pause_generation(clear_cache=True). - Pin the paired vLLM PR reference to the canonical upstream PR vllm-project/vllm#42694 (replaces the earlier ivanium-fork link). - Drop the on_failure "fallback" row from the SGLang comparison table; verl no longer offers that knob — vLLM serve fails loud if the master is unreachable, which is the right default for RL. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ream doc Cut 188 → 89 lines. The previous version duplicated vLLM-side setup (client install, master launch, JSON config) that already lives in the official vLLM Mooncake guide and drifts out of date here. Point readers at <https://docs.vllm.ai/en/latest/features/mooncake_store_connector_usage/> for setup and keep only what is verl-specific: the engine_kwargs.vllm. kv_transfer_config wiring, the reset_connector=True cascade contract for RL correctness (with the required vLLM build), and the when-to-enable / failure-mode summary. Co-authored-by: Claude <noreply@anthropic.com> Signed-off-by: aoshen <aoshen@inferact.ai>

89 → 57 lines. Cut three sections that don't belong in a "how to enable" doc: Failure modes (operational advice that drifts), When to enable (opinion / negative recommendation), the duplicate Hydra CLI override block (YAML form is enough), and a trailing line on no-op upstream behavior. Keep only: setup link to vLLM doc, the YAML wiring, and the RL-correctness reset cascade contract with required vLLM build. Co-authored-by: Claude <noreply@anthropic.com> Signed-off-by: aoshen <aoshen@inferact.ai>

Some users drive verl entirely from CLI overrides without touching YAML files, so keep the CLI form as a parallel example. Co-authored-by: Claude <noreply@anthropic.com> Signed-off-by: aoshen <aoshen@inferact.ai>

…ooncake-store-int # Conflicts: # docs/index.rst # verl/workers/rollout/vllm_rollout/vllm_async_server.py

This reverts commit 07f1765.

aoshen02 and others added 13 commits May 11, 2026 11:40

docs(rollout_kv_offload): restore Hydra CLI override example

fb7ceef

Some users drive verl entirely from CLI overrides without touching YAML files, so keep the CLI form as a parallel example. Co-authored-by: Claude <noreply@anthropic.com> Signed-off-by: aoshen <aoshen@inferact.ai>

Merge remote-tracking branch 'refs/remotes/upstream/main' into feat/m…

9cfadd9

…ooncake-store-int # Conflicts: # docs/index.rst # verl/workers/rollout/vllm_rollout/vllm_async_server.py

docs: simplify rollout kv offload guide

baf6efd

fix(vllm): guard reset_connector for older vllm

07f1765

aoshen02 force-pushed the feat/mooncake-store-int branch from 12a63e3 to 07f1765 Compare May 27, 2026 13:05

aoshen02 added 3 commits May 27, 2026 13:10

Revert "fix(vllm): guard reset_connector for older vllm"

6fd5654

This reverts commit 07f1765.

fix(vllm): guard reset_connector for vllm 0.13

e93c8f6

docs: document vllm version requirement for kv offload

00346ad

aoshen02 force-pushed the feat/mooncake-store-int branch 2 times, most recently from 16b392b to 00346ad Compare May 27, 2026 13:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rollout: enable MooncakeStoreConnector with hard-reset on weight update#1

rollout: enable MooncakeStoreConnector with hard-reset on weight update#1
aoshen02 wants to merge 16 commits into
mainfrom
feat/mooncake-store-int

aoshen02 commented May 11, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

aoshen02 commented May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why this matters

Paired vLLM PR

Soft-dependency contract

Tests

Test plan

AI assistance

Duplicate-work check

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

aoshen02 commented May 11, 2026 •

edited

Loading