feat: activation patching at scale (capture-sourced, continuous-batching-aware) by RhizoNymph · Pull Request #212 · RhizoNymph/vllm

RhizoNymph · 2026-06-28T04:38:50Z

Adds activation patching — overwriting (alpha=1) or interpolating residual-stream activations at specific (layer, hook, position) sites of a destination request with vectors captured from a prior clean run. Built for performant coarse→fine causal-tracing sweeps that mix freely into a continuously-batched stream: only the patched rows are intervened on, the rest pass through untouched.

What & why

Activation patching is steering with three changes — replace/lerp instead of add, per-(request, layer, hook, position) values, and values sourced from a prior capture run — so it reuses the steering/capture machinery and inherits the hard parts of continuous batching (per-token row gating, position→row mapping under chunked prefill, CUDA-graph-safe persistent buffers).

Data plane: apply_patch (lerp) + apply_patch_block (two-tensor post_block that reconstructs residual + hidden_states, since vLLM defers the MLP add and replace does not commute through it). Precise-lerp (1-α)·h + α·t so α=1 is a bit-exact replacement. Folded into apply_layer_steering / apply_block_steering via a process-global slot count — zero model-file edits.
Injection plane: per-(layer, hook) buffers + a per-step planner (abs_row = token_offset + (dest_pos - num_computed)), ephemeral per-step slots, strict overflow. CUDA-graph-safe (no force-eager).
Source: run-id-keyed PatchSourceStore (whole-run LRU) populated by a patch_source capture consumer that reuses the capture pipeline. Cross-rank: local resolution under PP, rank-0→peers broadcast under TP.
Config / spec: --enable-patching, SamplingParams.patch, OpenAI chat+completion plumbing, prefix-cache floor, admission validation.
Client: examples/online_serving/openai_patch_client.py — PatchStudy (sweep / zoom / heatmap) over the HTTP API.

Also cherry-picks the post_block capture-DCE fix (keeps the capture op live under torch.compile) so post_block patching works under CUDA graphs.

GPU-validated on Qwen3-0.6B across {eager, cudagraph} and {TP1/PP1, TP2, PP2}: replace is bit-exact, single-site patches recover the clean answer (denoising), and all parallelism configs agree.

…tion

… buffer

… patching

…nce check

Patch was only wired into the v2 runner, so any model not on the v2 allowlist (e.g. gemma3) silently accepted patch specs without applying them. Wire the same control plane into the v1 GPUModelRunner: PatchModelRunnerMixin, _init_patch_state, per-step _update_patch_buffers, and add/finish hooks. Move the runner-agnostic _patch_add_request into the base mixin (shared by both runners). Root fix: set the process-global patch slot count before the v1 model build so register_steering_buffers attaches patch buffers (the v2 runner already did this; v1 did not, so no patchable layers were discovered). GPU-validated on gemma3-4b (v1 runner) and Qwen3-0.6B (both runners), eager + cudagraph: no-op/self-identity bit-exact, cross-run replace reproduces clean, denoising surfaces the clean answer.

…vation-patching

A patched request re-forwards from its patch floor and registers its computed blocks under vanilla token hashes, so a later unpatched request with the same prompt could be served the patched KV (GPU repro: 0.47 max logprob corruption; only unnoticed because short validation prompts never filled a full block). Fold a deterministic patch-spec hash into the block hashes of all blocks at or after the lowest patched position (attention propagates the patch forward), the same mechanism steering uses. Blocks below the floor stay shareable, preserving the corrupt-prefix sharing that makes sweeps cheap; distinct specs get distinct KV chains. GPU-validated both ways: with the fix an unpatched rerun after a patched run is bit-identical to a fresh-engine ground truth; with the fix disabled it differs by 0.47.

Sweep cells graded the answer/foil by looking them up in the generated top-k logprobs — an answer outside top-k graded as None (top-k boundary flicker), silently dropping cells from the grid. Use the engine's logprob_token_ids to score the answer/foil ids exactly on every request: the sweep endpoint resolves answer_token/foil_token to single token ids via the tokenizer (400 if multi-token), and PatchStudy resolves them via /tokenize, both passing the ids through (logprob_token_ids is now exposed on the completions API). The engine requires logprobs == len(ids) when ids are given. Live-validated: a token far outside top-1 is reported exactly; the full sweep grid grades every cell (0 top-k None-mismatches, 63/63 cells).

…hed cells A source run evicted between admission (manifest check, positively cached) and worker resolution made the patch entry log-and-skip: the request ran UNPATCHED and its sweep cell silently reported the corrupt baseline as a patched result. Two layers of defense: - Leases: the admission path leases referenced runs on the workers (throttled to ~one RPC per run per half-TTL); store eviction skips unexpired-leased runs, soft-exceeding the byte budget with a warning instead of un-patching in-flight requests. Live-validated: a leased run survives capture pressure that would previously have evicted it, and re-sweeps grade 4/4 cells. - Backstop: any residual resolution miss is recorded per-request in a worker registry; the sweep endpoint drains it after each sweep (collective_rpc) and voids the affected cells (grid=None + skipped[] entries) instead of returning unpatched values.

…t a runner-set global The process-global slot count had to be set by each runner before its model build — the v1 runner didn't, which shipped patching as a silent no-op there. Resolve the slot count inside maybe_register_patch_buffers from get_current_vllm_config_or_none() (models are always built under set_current_vllm_config, on every runner), removing the runner-side setup from both runners; the global remains only as a test-context fallback. GPU-checked: buffers register and patching validates on both runners with no runner code.

source_position == dest_position silently patches shifted positions when the clean and corrupt prompts tokenize to different lengths — a plausible-looking but wrong heatmap. Add alignment: equal lengths map identity (corresponding positions are the causal-tracing pairing); unequal lengths map the common token prefix by identity and the common suffix by the length delta, and skip the differing middle loudly (skipped[] + alignment summary in the response). The sweep endpoint takes clean_prompt and refuses a length mismatch without it (the source run's captured prompt length is exposed via the admission cache); PatchStudy records the clean prompt on CleanRun and aligns automatically on both the per-cell and server-side paths. Live-validated: mismatch 400s without clean_prompt; an 11-vs-9-token pair aligns (prefix 4, suffix 4, middle skipped) and grades 16/16 aligned cells.

vLLM is not batch-invariant by default, so identical requests in different batch compositions return slightly different logprobs. Rather than forcing batch-invariant mode (a server-wide throughput tax far below causal-tracing signal), each sweep re-runs the corrupt baseline inside the cell batch and reports |delta| vs the solo baseline as noise_floor — grid differences at or below it are not meaningful. Docs point at batch_invariance for exact reproducibility.

- gpu_patch_validate gains check F: at the best denoising site, alpha in {0, 0.5, 1} must move the answer logprob monotonically corrupt -> clean (exact grading via logprob_token_ids). Validates the lerp path between its endpoints, which was only CPU-tested. - Chat admission rejects patch specs on multimodal prompts: prompt positions include image placeholder tokens, so patch positions would target placeholder activations — semantically undefined and unvalidated. Documented text-only scope.

…a3 TP2/PP2

… add span positions

feat(patch): --enable-patching implies patch_source capture consumer

refactor(patch): promote PatchStudy client into the vllm package + span-based positions

feat(patch): one-call patch sweeps via server-side auto-capture

…ETE)

feat(patch): compose server-side spans + one-call auto-capture in sweeps

feat(patch): opt-in SSE streaming for /v1/patch_sweep grids

feat(patch): multi-hook sweeps + source-run lifecycle

…(slot 0 sentinel)

RhizoNymph added 10 commits June 27, 2026 20:08

feat: activation patching — data plane, injection plane, source store

2198b5a

feat: activation patching config, request spec, admission, and resolu…

4adcc60

…tion

feat: PatchStudy client library for activation-patching sweeps

fa89ae0

test: offline GPU validation harness for activation patching

a3ea8ed

fix(capture): keep post_block capture op live so cudagraph writes the…

1500d71

… buffer

fix: PatchStudy uses capture_wait so clean sources are durable before…

e353e81

… patching

feat: scheduler per-site patch backpressure + admission source-existe…

c38a044

…nce check

test: add TP/PP args to patch validation harness

73cd018

feat: server-side /v1/patch_sweep endpoint (one-call grid sweeps)

08ab2ed

test: live validation of /v1/patch_sweep vs per-cell path

724be82

RhizoNymph mentioned this pull request Jul 1, 2026

perf(patch): Level-2 (2a) trunk re-entry prototype + proof #218

Closed

RhizoNymph and others added 19 commits July 1, 2026 11:47

docs: activation patching feature doc + interp-infra OVERVIEW index

23dee96

Merge branch 'feat/integration' into feat/activation-patching

9930218

Merge remote-tracking branch 'origin/feat/integration' into feat/acti…

cf0b32b

…vation-patching

docs(patch): sync feature doc with config-context registration + gemm…

f4d17b3

…a3 TP2/PP2

feat(patch): --enable-patching implies patch_source capture consumer

413b6c3

feat(patch): one-call sweeps via server-side auto-capture

624c7f3

refactor(patch): promote PatchStudy to vllm package, share alignment,…

05ff050

… add span positions

Merge pull request #226 from RhizoNymph/feat/patch-implies-consumer

885d3f8

feat(patch): --enable-patching implies patch_source capture consumer

Merge pull request #228 from RhizoNymph/refactor/patch-client-module

3a27351

refactor(patch): promote PatchStudy client into the vllm package + span-based positions

Merge pull request #227 from RhizoNymph/feat/patch-sweep-auto-capture

e82fc62

feat(patch): one-call patch sweeps via server-side auto-capture

feat(patch): compose server-side spans + one-call auto-capture in sweeps

2645283

RhizoNymph and others added 11 commits July 3, 2026 22:34

refactor(patch): drop superseded resolve_positions helper

1c2d381

feat(patch): opt-in SSE streaming for /v1/patch_sweep grids

8886e82

feat(patch): multi-hook sweeps + source-run lifecycle (auto-drop, DEL…

2be5ad5

…ETE)

fix(patch): stream cells and noise floor in the summary's metric units

4c5cec9

Merge pull request #235 from RhizoNymph/feat/patch-sweep-spans

a4f79ba

feat(patch): compose server-side spans + one-call auto-capture in sweeps

Merge pull request #236 from RhizoNymph/feat/patch-sweep-stream

ae47dbc

feat(patch): opt-in SSE streaming for /v1/patch_sweep grids

Merge feat/patch-sweep-spans (SSE streaming) into multi-hook lifecycle

7b8688d

test(patch): streamed multi-hook + auto-drop both-path coverage

74965ee

Merge pull request #237 from RhizoNymph/feat/patch-multihook-lifecycle

d4d920b

feat(patch): multi-hook sweeps + source-run lifecycle

Merge feat/patch-sweep-spans: streaming + multi-hook + source lifecycle

7e58cb1

fix(patch): scheduler backpressure must reserve against usable slots …

2371844

…(slot 0 sentinel)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: activation patching at scale (capture-sourced, continuous-batching-aware)#212

feat: activation patching at scale (capture-sourced, continuous-batching-aware)#212
RhizoNymph wants to merge 40 commits into
feat/integrationfrom
feat/activation-patching

RhizoNymph commented Jun 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

RhizoNymph commented Jun 28, 2026

What & why

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant