rollout: enable MooncakeStoreConnector with hard-reset on weight update#1
Open
aoshen02 wants to merge 16 commits into
Open
rollout: enable MooncakeStoreConnector with hard-reset on weight update#1aoshen02 wants to merge 16 commits into
aoshen02 wants to merge 16 commits into
Conversation
Adds `actor_rollout_ref.rollout.kv_store.{enable,kv_connector,kv_role,
config_path,extra_config,on_failure}` configuration. When `enable=true`,
the vLLM rollout engine is launched with `--kv-transfer-config` wiring
MooncakeStoreConnector, and all prefix-cache reset paths
(`wake_up`, `clear_kv_cache`, `abort_all_requests`) propagate
`reset_connector=True` so the Mooncake master is cleared via
`store.remove_all(force=True)` on every weight update.
This is the RL-correct hard-reset path: external KV blocks computed
against the previous model weights are dropped before any new rollout
request can read them, matching the existing in-engine prefix-cache
invalidation that verl already drives via `abort_all_requests` +
`reset_prefix_cache`.
`on_failure=fallback` (default) makes the connector a soft dependency:
training keeps running with the Mooncake offload disabled if the master
is unreachable at engine launch.
A `recipe_aoshen/phase_b_mooncake.sh` fork of `phase_b.sh` shows the
expected invocation (1 tray, Qwen3-0.6B GSM8K, 5 weight syncs); pair it
with `start_master.sh` in `projects/mooncake-integration/scripts/` for
a per-run master.
Paired with the vLLM-side cascade hook PR (MooncakeStoreConnector.
reset_cache routes through the existing ZMQ admin channel to worker
rank 0 -> store.remove_all).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Drops the recipe_aoshen/phase_b_mooncake.sh example (sample-only, not upstream-relevant) and instead documents the new actor_rollout_ref.rollout.kv_store knob and the RL-correctness contract (hard-reset on every weight update via reset_connector=True) in a new advance/rollout_kv_offload.md page. The doc covers: - When to enable the feature (and when not to) - The hard-reset cascade ending in store.remove_all(force=True) so reviewers can trace it without re-reading the diff - Configuration reference for every kv_store.* field - Required env vars (MOONCAKE_CONFIG_PATH, optional PYTHONHASHSEED=0) - Operational notes (per-run master, reset cost, failure modes) - Comparison vs SGLang's opt-in flush flow Wired into docs/index.rst under "Advanced Features" next to rollout_skip / rollout_trace. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two cleanups requested in PR review:
1. Use pause_generation's new reset_connector kwarg instead of an
extra engine.reset_prefix_cache call afterwards. The paired vLLM
patch now threads reset_connector all the way through
pause_generation -> pause_scheduler_async -> EngineCore.
pause_scheduler -> _reset_caches -> Scheduler.reset_prefix_cache,
so the hard-reset is a single call:
await self.engine.pause_generation(
wait_for_inflight_requests=False,
clear_cache=reset_prefix_cache,
reset_connector=reset_prefix_cache and self._kv_store_enabled,
)
No more "AsyncLLM.pause_generation does not propagate
reset_connector" workaround.
2. Move docs/advance/rollout_kv_offload.md ->
docs/perf/rollout_kv_offload.md. This is a performance / KV
offload feature, fits better under "Performance Tuning Guide"
next to perf/perf_tuning, perf/dpsk, etc. than under
"Advanced Features".
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Previous commit ("simplify Mooncake hard-reset and move doc to perf/")
only landed the file rename; the two payload edits accidentally got
dropped from the staging round. This commit ships them:
- verl/workers/rollout/vllm_rollout/vllm_async_server.py: replace the
trailing `await self.engine.reset_prefix_cache(reset_connector=True)`
workaround with a single `pause_generation(..., reset_connector=...)`
call (rides on the paired vllm reset_connector kwarg now plumbed
through pause_generation -> EngineCore._reset_caches).
- docs/index.rst: move rollout_kv_offload from "Advanced Features"
toctree to "Performance Tuning Guide" toctree, next to perf_tuning
and dpsk where it belongs as a perf/KV-offload feature.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The flag served two roles:
1. Gate whether to attach kv_transfer_config to vllm serve args at
launch time.
2. Gate whether to pass reset_connector=True on every cache reset.
Role 2 is unnecessary now that the paired vLLM patch (scheduler:
treat reset_connector with no connector as no-op success) makes
Scheduler.reset_connector_cache return True without a warning when
no connector is attached. The reset paths can simply ask for a
connector reset unconditionally; the engine decides what to do.
Role 1 stays as an inline check on `kv_store_cfg.get("enable", False)`
in launch_server — no need to remember the result on the adapter.
Result: three reset call sites (`wake_up`, `clear_kv_cache`,
`abort_all_requests` via `pause_generation`) all pass
`reset_connector=True` unconditionally. No instance flag, no
conditional, less state to reason about.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…g passthrough Replace the verl-specific KVStoreConfig dataclass (enable/kv_connector/ kv_role/config_path/extra_config/on_failure) with the existing generic `actor_rollout_ref.rollout.engine_kwargs.vllm.<key>` passthrough. Users opting into the Mooncake-Store offload now set `engine_kwargs.vllm.kv_transfer_config` to the JSON string that vLLM's own --kv-transfer-config CLI flag already accepts (vLLM decodes it into its first-class KVTransferConfig). Net effect: - No verl-side schema to learn / document / migrate when vLLM adds new kv_transfer_config fields or new KV connectors (NixlConnector, P2pNcclConnector, MultiConnector, future ones — all work without a verl change). - ~40 lines of dataclass + 23 lines of "if enable: build dict" launch logic deleted; no behavior change for existing users (the field was default-disabled). - The on_failure="fallback" soft-dependency knob is dropped. Silent disable of a configured KV store is the wrong default for RL runs — hours of post-update stale-cache reads would be hidden. vLLM serve now fails loud if the Mooncake master is unreachable at engine launch; callers wanting soft mode can wrap the launch with a pre-flight healthcheck. Also drop the now-unsupported `reset_connector=` kwarg from the `AsyncLLM.pause_generation(...)` call in abort_all_requests — upstream vLLM does not (and per the paired cascade PR, does not need to) expose that kwarg. EngineCore._reset_caches defaults reset_connector=True so the connector cascade fires automatically whenever pause_generation runs with clear_cache=True. The wake_up / clear_kv_cache call sites keep the explicit reset_connector=True on reset_prefix_cache (the supported upstream entry point); both are no-op success when no connector is configured, so they remain safe to pass unconditionally. Paired vLLM PR: vllm-project/vllm#42694. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…onfig passthrough The verl-side KVStoreConfig dataclass is gone; the doc page no longer needs to document its fields. Replace the configuration section with the actual minimal recipe: set engine_kwargs.vllm.kv_transfer_config to a JSON string that vLLM's KVTransferConfig already accepts. Also: - Update the RL-correctness section to describe both reset entry points: explicit reset_prefix_cache(reset_connector=True) from wake_up / clear_kv_cache, and the automatic cascade through EngineCore._reset_caches when abort_all_requests goes through pause_generation(clear_cache=True). - Pin the paired vLLM PR reference to the canonical upstream PR vllm-project/vllm#42694 (replaces the earlier ivanium-fork link). - Drop the on_failure "fallback" row from the SGLang comparison table; verl no longer offers that knob — vLLM serve fails loud if the master is unreachable, which is the right default for RL. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ream doc Cut 188 → 89 lines. The previous version duplicated vLLM-side setup (client install, master launch, JSON config) that already lives in the official vLLM Mooncake guide and drifts out of date here. Point readers at <https://docs.vllm.ai/en/latest/features/mooncake_store_connector_usage/> for setup and keep only what is verl-specific: the engine_kwargs.vllm. kv_transfer_config wiring, the reset_connector=True cascade contract for RL correctness (with the required vLLM build), and the when-to-enable / failure-mode summary. Co-authored-by: Claude <noreply@anthropic.com> Signed-off-by: aoshen <aoshen@inferact.ai>
89 → 57 lines. Cut three sections that don't belong in a "how to enable" doc: Failure modes (operational advice that drifts), When to enable (opinion / negative recommendation), the duplicate Hydra CLI override block (YAML form is enough), and a trailing line on no-op upstream behavior. Keep only: setup link to vLLM doc, the YAML wiring, and the RL-correctness reset cascade contract with required vLLM build. Co-authored-by: Claude <noreply@anthropic.com> Signed-off-by: aoshen <aoshen@inferact.ai>
Some users drive verl entirely from CLI overrides without touching YAML files, so keep the CLI form as a parallel example. Co-authored-by: Claude <noreply@anthropic.com> Signed-off-by: aoshen <aoshen@inferact.ai>
…ooncake-store-int # Conflicts: # docs/index.rst # verl/workers/rollout/vllm_rollout/vllm_async_server.py
12a63e3 to
07f1765
Compare
16b392b to
00346ad
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
actor_rollout_ref.rollout.kv_storeconfig (enable,kv_connector,kv_role,config_path,extra_config,on_failure) and wires the vLLM rollout engine launch with--kv-transfer-configso an externalMooncakeStoreConnectorcan pool prefix KV across rollout replicas.reset_connector=Trueon every prefix-cache reset path (wake_up,clear_kv_cache,abort_all_requests) whenkv_store.enableis set, so the external Mooncake master is cleared viastore.remove_all(force=True)on every weight update. RL-correct hard-reset path: external KV blocks computed against the previous model weights are dropped before any new rollout request can read them.docs/advance/rollout_kv_offload.md(linked fromdocs/index.rstunder "Advanced Features") documenting when to use it, the full reset cascade, the config schema, env vars, operational notes, and a comparison vs SGLang's HiCacheStorage flow.Why this matters
verl's existing
update_weightsflow already does the right thing for vLLM's in-engine prefix cache:abort_all_requestsdrains in-flight, sleep + NCCL sync the weights, thenwake_upcallsengine.reset_prefix_cache(). The guard inBlockPool.reset_prefix_cachewipes the in-memory hash table because abort drained everything.What it does not do (yet) is reset the external Mooncake KV store. Without this PR, the next request's
MooncakeStoreScheduler.get_num_new_matched_tokens()would happily query Mooncake, find a prefix that the previous-weight rollout wrote there, load it, and let the new weights attend to KV computed against the old policy — silent correctness loss.The vLLM-side companion patch implements
MooncakeStoreConnector.reset_cache()(paired PR; routes via the existing ZMQ admin channel to worker rank 0 ->store.remove_all(force=True)). This verl-side PR is the wire that triggers it.Paired vLLM PR
Depends on
MooncakeStoreConnector.reset_cache()landing in vLLM. Companion PR: ivanium/vllm#46 (againstfeat/mooncake-store-int-> upstream vllm-project/vllm).Soft-dependency contract
KVStoreConfig.on_failuredefaults to"fallback": if the Mooncake master is unreachable when the engine launches, training keeps running with the connector disabled (warning logged). Set to"crash"for stricter CI use.Tests
connector.reset_cache()-> ZMQ admin RPC ->store.remove_all(force=True)on a per-run Mooncake master. Master Admin Metrics confirmedDelAll=2/2matching the smoke flow (PutStart:(Req=12/0/12, Item=12/12),ExistKey:(Req=3/0/3, Item=18/18)).remove_all(force=True)validation (put 8 keys -> exist -> remove_all -> exist=0 -> re-put -> exist=1) passed againstivanium/Mooncake yifan/devbuild on aarch64 GB200.RESET_MAGICdiscriminator did not break the existing ZMQ lookup wire format.py_compile).RolloutConfig(name='vllm')->cfg.kv_store.enable=False,.kv_connector="MooncakeStoreConnector",.get('enable', False)=Falseverified against the patched dataclass (importlib overlay test inside the container's verl env).Test plan
remove_allstandalone validation against the production master build.get()interface)kv_store.enable=true— needs a free Ray cluster + ~1 hour bench time; the new doc page lists the expected acceptance criteria.AI assistance
This PR was authored with the help of Claude Opus 4.7 (1M context). Every line was reviewed end-to-end. The config schema + flag-propagation strategy was chosen to be minimal (one new dataclass, three call-site reset_connector arguments, one CLI flag for vllm-serve, one doc page wired into the existing "Advanced Features" toctree).
Duplicate-work check
No existing verl PR adds
MooncakeStoreConnectorrollout support. The related PD-disaggregation work onfeat/verl-vllm-pd-disagg(the existing PR verl-project#6243 branch) addresses a different connector (MooncakeConnectorfor PD transport, notMooncakeStoreConnectorfor offload pool) and a different lifecycle (per-request handshake vs per-update reset). The two could be combined in aMultiConnectorconfig in a follow-up PR.