fix(steering): warmup matches runtime row-monitor specialization; single-source op args by RhizoNymph · Pull Request #230 · RhizoNymph/vllm

RhizoNymph · 2026-07-03T07:07:55Z

Two related data-plane fixes to the steering kernel path.

Fix 1 — warmup compiled the wrong Triton specialization (default config)

warmup_apply_steering_kernel always allocated full-size per-row-monitor buffers (rprobe = (table_rows, hidden), rparams = (table_rows, 2)). But with enable_row_monitor=False — the default — layers keep the registered (1, 1) / (1, 2) dummy buffers (resize_steering_row_monitor_buffers is a no-op when disabled). The kernel receives the per-row probe table's leading stride rp_stride_r = probe_table.stride(0): 1 for the dummy vs hidden_size for warmup's buffer. Triton specializes integer args on == 1 (constexpr) and divisibility-by-16, so the cache keys differ — warmup compiled a variant the default runtime never hits, and the first real forward JIT-compiled fresh, exactly the served-window/capture-time cost warmup exists to prevent.

Fix: thread row_monitor_enabled into warmup_apply_steering_kernel; when False allocate (1, 1) / (1, 2) buffers matching the registered dummies, when True keep full-size (matching resize_steering_row_monitor_buffers). The caller in steering_model_runner_mixin.py passes the runner's _row_monitor_enabled state.

The old warmup regression test (test_subsequent_invocations_at_warmed_shape_no_new_variants) built its runtime-mimic with a full-size (8, 128) probe table — replicating warmup's wrong shape rather than the true default runtime shape — so it asserted the buggy behavior and passed. That test is fixed here to use the real default-config shapes and is parametrized over both row-monitor enable states, asserting the JIT cache does not grow after runtime-shaped calls in each.

The sibling steering_monitor_kernel.py warmup has no analogous issue: none of the monitor op's tensors have a dummy-vs-full-size distinction.

Fix 2 — single source of truth for the 15-arg op signature

The 15-tensor positional list was repeated at ~8 sites (emit, op impl, fake, Triton wrapper, kernel launch interleaved with ~16 stride scalars, warmup ×2, tests). All args are same-typed tensors, so a transposition type-checks and fails only behaviorally.

Added SteeringOpArgs(NamedTuple) (15 tensor fields in canonical order) plus a _build_steering_op_args builder used by _emit_steering_op and by warmup. The registered op, fake, and Triton wrapper keep their flat signatures (torch custom-op schemas require flat tensors) as the only flat sites, each mirroring the NamedTuple order.
Added a CPU schema-lock test asserting SteeringOpArgs._fields equals the registered op schema's argument names in order.
The Triton launch derives its stride scalars via _steering_kernel_strides, so tensor/stride pairing is generated from the tensors rather than hand-zipped at the highest-risk interleaved site. The emitted launch is identical.
The three bool flags are left as separate tensors (packing them would collide with the steering_monitor_off substitution used in cross-layer monitor mode). No change to the op schema, arity, or any kernel semantics.

Notes

Both changes are host-side / warmup-only and produce byte-identical kernel behavior. CPU tests pass:

test_steering_op.py, test_block_steering.py, test_steering_monitor_op.py, test_steering_monitor.py, test_steering_row_monitor.py, test_steering_warmup.py — 57 passed, 5 skipped (CUDA-only).

GPU confirmation of the warmup cache-idempotency assertions is pending (those parts skip without CUDA).

Sibling-PR conflicts: none expected on steering.py / steering_kernel.py. chore/steering-row-owner also touches steering_model_runner_mixin.py; the one-line caller change here may conflict trivially.

…gle-source op args

fix(steering): warmup matches runtime row-monitor specialization; sin…

39d5532

…gle-source op args

RhizoNymph merged commit d3258d4 into feat/dynamic-steering Jul 5, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(steering): warmup matches runtime row-monitor specialization; single-source op args#230

fix(steering): warmup matches runtime row-monitor specialization; single-source op args#230
RhizoNymph merged 1 commit into
feat/dynamic-steeringfrom
fix/steering-op-args-warmup

RhizoNymph commented Jul 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

RhizoNymph commented Jul 3, 2026

Fix 1 — warmup compiled the wrong Triton specialization (default config)

Fix 2 — single source of truth for the 15-arg op signature

Notes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant