feat(steering): dynamic steering — activation-conditioned steering#180
Draft
RhizoNymph wants to merge 125 commits into
Draft
feat(steering): dynamic steering — activation-conditioned steering#180RhizoNymph wants to merge 125 commits into
RhizoNymph wants to merge 125 commits into
Conversation
…d decode steering)
…tier, packed banks)
…n with per-request actuation
…d activation_reward_producer Co-authored-by: Claude
…ed FP nondeterminism
…th operator decode steering
…l term), replace populate-folding
…cuit (Phase 2 M2)
…r-gain + in-graph probe)
fix(steering): content-keyed bounded probe tensor cache
fix(steering): bridged overrides preserve compose_admitted
fix(steering): fail-safe declarative gate resolution at admission
…arity fix(steering): port declarative override parity (compose+precedence) to v2 runner
…declarative-gate-fail-closed # Conflicts: # tests/v1/test_steering_schema.py
fix(steering): declarative probe gates fail closed
…gle-source op args
chore(steering): typed RowOwner state + refcount-0 purge + dirty-state grouping
…ness test(steering): cross-runner conformance harness for the control plane
fix(capture): port client_request_id sidecar + streaming metadata refresh to v1
…cksum feat(steering): cross-rank applied-action checksum in dynamic status
fix(steering): warmup matches runtime row-monitor specialization; single-source op args
feat(steering): worker-registered named vectors + latch-by-reference
test(steering): drop stale second arg from scheduler override hook call
…e/steering-trust-hardening # Conflicts: # vllm/v1/worker/steering_model_runner_mixin.py
test(steering): conformance harness tracks typed RowOwner keys
Bound latch memory + document dynamic-steering trust model
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Draft. Ties activation capture to activation steering so activations decide when/how to steer. Three controller tiers (async → sync → in-graph), each configuring the one below. Design authority:
docs/design/dynamic_steering.md(+dynamic_steering_apc_notification.md,dynamic_steering_row_gating.md).What's here
steering_action_queue(bounded, decode-tier-only validation), drained at the top of_update_steering_buffers.execution="sync"consumer axis (every TP rank, 1-step latency); dynamic-override row pool (pure routing); observability +GET /v1/steering/dynamic; event-basedon_steptiming.sigmoid(sharpness·(residual·probe − threshold))and modulates the §5.4 tier same-forward; and per-request rows (decode-only, prefill protected via a decode mask) whengate_rowsis set.emit_mode = scale | monitor.Status / validation
GPU-validated on gemma4-31B: tp=1 (per-request actuation, tier, APC reuse), tp=2 cross-node (rank-replication + APC re-keying), pp=2, active in-graph monitor (tier + row gating), and row-gating kernel/op/cudagraph parity. Extensive CPU suites.
Notes for review
model_runner_v2integration (upstream dev-flag-gated).