Skip to content

feat(patch): compose server-side spans + one-call auto-capture in sweeps#235

Merged
RhizoNymph merged 2 commits into
feat/activation-patchingfrom
feat/patch-sweep-spans
Jul 5, 2026
Merged

feat(patch): compose server-side spans + one-call auto-capture in sweeps#235
RhizoNymph merged 2 commits into
feat/activation-patchingfrom
feat/patch-sweep-spans

Conversation

@RhizoNymph

Copy link
Copy Markdown
Owner

Makes "one request, no token indices" work from both the raw HTTP surface and the PatchStudy client by composing the two recently-merged ergonomic features that previously didn't interoperate: substring spans were client-only, and one-call auto-capture was server-only.

Server-side span positions in /v1/patch_sweep

PatchSweepRequest.positions now accepts span objects mixed with integers:

"positions": [{"span": "Germany", "occurrence": 0}, 4]

A new SpanPosition model resolves server-side against the corrupt prompt (the destination run), tokenized identically to the sweep. Offsets come from the fast tokenizer's offset mapping when available (offsets index the original prompt), falling back to incremental detokenization of growing id prefixes; special tokens (e.g. BOS) map to an empty span and are never selected. Expansion is order-preserving with dedup across the whole list. Empty spans, missing substrings, and out-of-range occurrences are clean 400s. The response's positions remains the resolved integer axis.

The pure resolution math is relocated to a shared vllm/entrypoints/serve/patch/spans.py (resolve_span_positions, dedup_positions, incremental_char_offsets, prompt_char_offsets); the client imports it too, so there is a single copy in the tree.

PatchStudy one-call wiring

sweep_layers_positions gains a clean_prompt keyword and makes run optional. With server_side=True and clean_prompt (and no captured clean/run), the client generates a fresh per-call run name and lets the server auto-capture, surfacing auto_captured/captured_source_run on SweepResult. Span markers are forwarded to the server for server_side=True sweeps (resolved there) and only resolved client-side on the per-cell path. An existing run wins over auto-capture; clean_prompt on the per-cell path without a captured handle raises a clear ValueError.

Docs and the runnable example are updated with the new JSON shape and a one-call PatchStudy snippet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant