Skip to content

rollout: enable MooncakeStoreConnector with hard-reset on weight update#1

Open
aoshen02 wants to merge 16 commits into
mainfrom
feat/mooncake-store-int
Open

rollout: enable MooncakeStoreConnector with hard-reset on weight update#1
aoshen02 wants to merge 16 commits into
mainfrom
feat/mooncake-store-int

Conversation

@aoshen02
Copy link
Copy Markdown
Owner

@aoshen02 aoshen02 commented May 11, 2026

Summary

  • Adds actor_rollout_ref.rollout.kv_store config (enable, kv_connector, kv_role, config_path, extra_config, on_failure) and wires the vLLM rollout engine launch with --kv-transfer-config so an external MooncakeStoreConnector can pool prefix KV across rollout replicas.
  • Propagates reset_connector=True on every prefix-cache reset path (wake_up, clear_kv_cache, abort_all_requests) when kv_store.enable is set, so the external Mooncake master is cleared via store.remove_all(force=True) on every weight update. RL-correct hard-reset path: external KV blocks computed against the previous model weights are dropped before any new rollout request can read them.
  • Adds docs/advance/rollout_kv_offload.md (linked from docs/index.rst under "Advanced Features") documenting when to use it, the full reset cascade, the config schema, env vars, operational notes, and a comparison vs SGLang's HiCacheStorage flow.

Why this matters

verl's existing update_weights flow already does the right thing for vLLM's in-engine prefix cache: abort_all_requests drains in-flight, sleep + NCCL sync the weights, then wake_up calls engine.reset_prefix_cache(). The guard in BlockPool.reset_prefix_cache wipes the in-memory hash table because abort drained everything.

What it does not do (yet) is reset the external Mooncake KV store. Without this PR, the next request's MooncakeStoreScheduler.get_num_new_matched_tokens() would happily query Mooncake, find a prefix that the previous-weight rollout wrote there, load it, and let the new weights attend to KV computed against the old policy — silent correctness loss.

The vLLM-side companion patch implements MooncakeStoreConnector.reset_cache() (paired PR; routes via the existing ZMQ admin channel to worker rank 0 -> store.remove_all(force=True)). This verl-side PR is the wire that triggers it.

Paired vLLM PR

Depends on MooncakeStoreConnector.reset_cache() landing in vLLM. Companion PR: ivanium/vllm#46 (against feat/mooncake-store-int -> upstream vllm-project/vllm).

Soft-dependency contract

KVStoreConfig.on_failure defaults to "fallback": if the Mooncake master is unreachable when the engine launches, training keeps running with the connector disabled (warning logged). Set to "crash" for stricter CI use.

Tests

  • Cascade integration verified inside the production rollout container with the paired vLLM patch applied: connector.reset_cache() -> ZMQ admin RPC -> store.remove_all(force=True) on a per-run Mooncake master. Master Admin Metrics confirmed DelAll=2/2 matching the smoke flow (PutStart:(Req=12/0/12, Item=12/12), ExistKey:(Req=3/0/3, Item=18/18)).
  • Standalone Mooncake remove_all(force=True) validation (put 8 keys -> exist -> remove_all -> exist=0 -> re-put -> exist=1) passed against ivanium/Mooncake yifan/dev build on aarch64 GB200.
  • Lookup-path regression smoke (5 cases incl. lookup-after-reset) verifies the new RESET_MAGIC discriminator did not break the existing ZMQ lookup wire format.
  • Static syntax-check on the modified verl files (py_compile).
  • RolloutConfig(name='vllm') -> cfg.kv_store.enable=False, .kv_connector="MooncakeStoreConnector", .get('enable', False)=False verified against the patched dataclass (importlib overlay test inside the container's verl env).

Test plan

  • Cascade evidence captured + paired vLLM unit tests (7/7 reset tests, 37/37 mooncake-store tests overall)
  • Mooncake remove_all standalone validation against the production master build
  • Lookup-path regression (no wire-format break from RESET_MAGIC addition)
  • verl config dataclass smoke (KVStoreConfig defaults, mapping .get() interface)
  • Full GRPO 5-cycle bench run with kv_store.enable=true — needs a free Ray cluster + ~1 hour bench time; the new doc page lists the expected acceptance criteria.

AI assistance

This PR was authored with the help of Claude Opus 4.7 (1M context). Every line was reviewed end-to-end. The config schema + flag-propagation strategy was chosen to be minimal (one new dataclass, three call-site reset_connector arguments, one CLI flag for vllm-serve, one doc page wired into the existing "Advanced Features" toctree).

Duplicate-work check

No existing verl PR adds MooncakeStoreConnector rollout support. The related PD-disaggregation work on feat/verl-vllm-pd-disagg (the existing PR verl-project#6243 branch) addresses a different connector (MooncakeConnector for PD transport, not MooncakeStoreConnector for offload pool) and a different lifecycle (per-request handshake vs per-update reset). The two could be combined in a MultiConnector config in a follow-up PR.

aoshen02 and others added 13 commits May 11, 2026 11:40
Adds `actor_rollout_ref.rollout.kv_store.{enable,kv_connector,kv_role,
config_path,extra_config,on_failure}` configuration. When `enable=true`,
the vLLM rollout engine is launched with `--kv-transfer-config` wiring
MooncakeStoreConnector, and all prefix-cache reset paths
(`wake_up`, `clear_kv_cache`, `abort_all_requests`) propagate
`reset_connector=True` so the Mooncake master is cleared via
`store.remove_all(force=True)` on every weight update.

This is the RL-correct hard-reset path: external KV blocks computed
against the previous model weights are dropped before any new rollout
request can read them, matching the existing in-engine prefix-cache
invalidation that verl already drives via `abort_all_requests` +
`reset_prefix_cache`.

`on_failure=fallback` (default) makes the connector a soft dependency:
training keeps running with the Mooncake offload disabled if the master
is unreachable at engine launch.

A `recipe_aoshen/phase_b_mooncake.sh` fork of `phase_b.sh` shows the
expected invocation (1 tray, Qwen3-0.6B GSM8K, 5 weight syncs); pair it
with `start_master.sh` in `projects/mooncake-integration/scripts/` for
a per-run master.

Paired with the vLLM-side cascade hook PR (MooncakeStoreConnector.
reset_cache routes through the existing ZMQ admin channel to worker
rank 0 -> store.remove_all).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Drops the recipe_aoshen/phase_b_mooncake.sh example (sample-only, not
upstream-relevant) and instead documents the new
actor_rollout_ref.rollout.kv_store knob and the RL-correctness contract
(hard-reset on every weight update via reset_connector=True) in a new
advance/rollout_kv_offload.md page.

The doc covers:
- When to enable the feature (and when not to)
- The hard-reset cascade ending in store.remove_all(force=True) so
  reviewers can trace it without re-reading the diff
- Configuration reference for every kv_store.* field
- Required env vars (MOONCAKE_CONFIG_PATH, optional PYTHONHASHSEED=0)
- Operational notes (per-run master, reset cost, failure modes)
- Comparison vs SGLang's opt-in flush flow

Wired into docs/index.rst under "Advanced Features" next to
rollout_skip / rollout_trace.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two cleanups requested in PR review:

1. Use pause_generation's new reset_connector kwarg instead of an
   extra engine.reset_prefix_cache call afterwards. The paired vLLM
   patch now threads reset_connector all the way through
   pause_generation -> pause_scheduler_async -> EngineCore.
   pause_scheduler -> _reset_caches -> Scheduler.reset_prefix_cache,
   so the hard-reset is a single call:

     await self.engine.pause_generation(
         wait_for_inflight_requests=False,
         clear_cache=reset_prefix_cache,
         reset_connector=reset_prefix_cache and self._kv_store_enabled,
     )

   No more "AsyncLLM.pause_generation does not propagate
   reset_connector" workaround.

2. Move docs/advance/rollout_kv_offload.md ->
   docs/perf/rollout_kv_offload.md. This is a performance / KV
   offload feature, fits better under "Performance Tuning Guide"
   next to perf/perf_tuning, perf/dpsk, etc. than under
   "Advanced Features".

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Previous commit ("simplify Mooncake hard-reset and move doc to perf/")
only landed the file rename; the two payload edits accidentally got
dropped from the staging round. This commit ships them:

- verl/workers/rollout/vllm_rollout/vllm_async_server.py: replace the
  trailing `await self.engine.reset_prefix_cache(reset_connector=True)`
  workaround with a single `pause_generation(..., reset_connector=...)`
  call (rides on the paired vllm reset_connector kwarg now plumbed
  through pause_generation -> EngineCore._reset_caches).
- docs/index.rst: move rollout_kv_offload from "Advanced Features"
  toctree to "Performance Tuning Guide" toctree, next to perf_tuning
  and dpsk where it belongs as a perf/KV-offload feature.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The flag served two roles:
  1. Gate whether to attach kv_transfer_config to vllm serve args at
     launch time.
  2. Gate whether to pass reset_connector=True on every cache reset.

Role 2 is unnecessary now that the paired vLLM patch (scheduler:
treat reset_connector with no connector as no-op success) makes
Scheduler.reset_connector_cache return True without a warning when
no connector is attached. The reset paths can simply ask for a
connector reset unconditionally; the engine decides what to do.

Role 1 stays as an inline check on `kv_store_cfg.get("enable", False)`
in launch_server — no need to remember the result on the adapter.

Result: three reset call sites (`wake_up`, `clear_kv_cache`,
`abort_all_requests` via `pause_generation`) all pass
`reset_connector=True` unconditionally. No instance flag, no
conditional, less state to reason about.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…g passthrough

Replace the verl-specific KVStoreConfig dataclass (enable/kv_connector/
kv_role/config_path/extra_config/on_failure) with the existing generic
`actor_rollout_ref.rollout.engine_kwargs.vllm.<key>` passthrough. Users
opting into the Mooncake-Store offload now set
`engine_kwargs.vllm.kv_transfer_config` to the JSON string that vLLM's
own --kv-transfer-config CLI flag already accepts (vLLM decodes it into
its first-class KVTransferConfig).

Net effect:
- No verl-side schema to learn / document / migrate when vLLM adds new
  kv_transfer_config fields or new KV connectors (NixlConnector,
  P2pNcclConnector, MultiConnector, future ones — all work without a
  verl change).
- ~40 lines of dataclass + 23 lines of "if enable: build dict" launch
  logic deleted; no behavior change for existing users (the field was
  default-disabled).
- The on_failure="fallback" soft-dependency knob is dropped. Silent
  disable of a configured KV store is the wrong default for RL runs —
  hours of post-update stale-cache reads would be hidden. vLLM serve
  now fails loud if the Mooncake master is unreachable at engine
  launch; callers wanting soft mode can wrap the launch with a
  pre-flight healthcheck.

Also drop the now-unsupported `reset_connector=` kwarg from the
`AsyncLLM.pause_generation(...)` call in abort_all_requests — upstream
vLLM does not (and per the paired cascade PR, does not need to) expose
that kwarg. EngineCore._reset_caches defaults reset_connector=True so
the connector cascade fires automatically whenever pause_generation
runs with clear_cache=True. The wake_up / clear_kv_cache call sites
keep the explicit reset_connector=True on reset_prefix_cache (the
supported upstream entry point); both are no-op success when no
connector is configured, so they remain safe to pass unconditionally.

Paired vLLM PR: vllm-project/vllm#42694.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…onfig passthrough

The verl-side KVStoreConfig dataclass is gone; the doc page no longer
needs to document its fields. Replace the configuration section with
the actual minimal recipe: set engine_kwargs.vllm.kv_transfer_config
to a JSON string that vLLM's KVTransferConfig already accepts.

Also:
- Update the RL-correctness section to describe both reset entry
  points: explicit reset_prefix_cache(reset_connector=True) from
  wake_up / clear_kv_cache, and the automatic cascade through
  EngineCore._reset_caches when abort_all_requests goes through
  pause_generation(clear_cache=True).
- Pin the paired vLLM PR reference to the canonical upstream PR
  vllm-project/vllm#42694 (replaces the earlier ivanium-fork link).
- Drop the on_failure "fallback" row from the SGLang comparison
  table; verl no longer offers that knob — vLLM serve fails loud if
  the master is unreachable, which is the right default for RL.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ream doc

Cut 188 → 89 lines. The previous version duplicated vLLM-side setup
(client install, master launch, JSON config) that already lives in the
official vLLM Mooncake guide and drifts out of date here. Point readers
at <https://docs.vllm.ai/en/latest/features/mooncake_store_connector_usage/>
for setup and keep only what is verl-specific: the engine_kwargs.vllm.
kv_transfer_config wiring, the reset_connector=True cascade contract for
RL correctness (with the required vLLM build), and the when-to-enable /
failure-mode summary.

Co-authored-by: Claude <noreply@anthropic.com>
Signed-off-by: aoshen <aoshen@inferact.ai>
89 → 57 lines. Cut three sections that don't belong in a "how to enable"
doc: Failure modes (operational advice that drifts), When to enable
(opinion / negative recommendation), the duplicate Hydra CLI override
block (YAML form is enough), and a trailing line on no-op upstream
behavior. Keep only: setup link to vLLM doc, the YAML wiring, and the
RL-correctness reset cascade contract with required vLLM build.

Co-authored-by: Claude <noreply@anthropic.com>
Signed-off-by: aoshen <aoshen@inferact.ai>
Some users drive verl entirely from CLI overrides without touching YAML
files, so keep the CLI form as a parallel example.

Co-authored-by: Claude <noreply@anthropic.com>
Signed-off-by: aoshen <aoshen@inferact.ai>
…ooncake-store-int

# Conflicts:
#	docs/index.rst
#	verl/workers/rollout/vllm_rollout/vllm_async_server.py
@aoshen02 aoshen02 force-pushed the feat/mooncake-store-int branch from 12a63e3 to 07f1765 Compare May 27, 2026 13:05
@aoshen02 aoshen02 force-pushed the feat/mooncake-store-int branch 2 times, most recently from 16b392b to 00346ad Compare May 27, 2026 13:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant