Skip to content

fix(steering): warmup matches runtime row-monitor specialization; single-source op args#230

Merged
RhizoNymph merged 1 commit into
feat/dynamic-steeringfrom
fix/steering-op-args-warmup
Jul 5, 2026
Merged

fix(steering): warmup matches runtime row-monitor specialization; single-source op args#230
RhizoNymph merged 1 commit into
feat/dynamic-steeringfrom
fix/steering-op-args-warmup

Conversation

@RhizoNymph

Copy link
Copy Markdown
Owner

Two related data-plane fixes to the steering kernel path.

Fix 1 — warmup compiled the wrong Triton specialization (default config)

warmup_apply_steering_kernel always allocated full-size per-row-monitor buffers (rprobe = (table_rows, hidden), rparams = (table_rows, 2)). But with enable_row_monitor=False — the default — layers keep the registered (1, 1) / (1, 2) dummy buffers (resize_steering_row_monitor_buffers is a no-op when disabled). The kernel receives the per-row probe table's leading stride rp_stride_r = probe_table.stride(0): 1 for the dummy vs hidden_size for warmup's buffer. Triton specializes integer args on == 1 (constexpr) and divisibility-by-16, so the cache keys differ — warmup compiled a variant the default runtime never hits, and the first real forward JIT-compiled fresh, exactly the served-window/capture-time cost warmup exists to prevent.

Fix: thread row_monitor_enabled into warmup_apply_steering_kernel; when False allocate (1, 1) / (1, 2) buffers matching the registered dummies, when True keep full-size (matching resize_steering_row_monitor_buffers). The caller in steering_model_runner_mixin.py passes the runner's _row_monitor_enabled state.

The old warmup regression test (test_subsequent_invocations_at_warmed_shape_no_new_variants) built its runtime-mimic with a full-size (8, 128) probe table — replicating warmup's wrong shape rather than the true default runtime shape — so it asserted the buggy behavior and passed. That test is fixed here to use the real default-config shapes and is parametrized over both row-monitor enable states, asserting the JIT cache does not grow after runtime-shaped calls in each.

The sibling steering_monitor_kernel.py warmup has no analogous issue: none of the monitor op's tensors have a dummy-vs-full-size distinction.

Fix 2 — single source of truth for the 15-arg op signature

The 15-tensor positional list was repeated at ~8 sites (emit, op impl, fake, Triton wrapper, kernel launch interleaved with ~16 stride scalars, warmup ×2, tests). All args are same-typed tensors, so a transposition type-checks and fails only behaviorally.

  • Added SteeringOpArgs(NamedTuple) (15 tensor fields in canonical order) plus a _build_steering_op_args builder used by _emit_steering_op and by warmup. The registered op, fake, and Triton wrapper keep their flat signatures (torch custom-op schemas require flat tensors) as the only flat sites, each mirroring the NamedTuple order.
  • Added a CPU schema-lock test asserting SteeringOpArgs._fields equals the registered op schema's argument names in order.
  • The Triton launch derives its stride scalars via _steering_kernel_strides, so tensor/stride pairing is generated from the tensors rather than hand-zipped at the highest-risk interleaved site. The emitted launch is identical.
  • The three bool flags are left as separate tensors (packing them would collide with the steering_monitor_off substitution used in cross-layer monitor mode). No change to the op schema, arity, or any kernel semantics.

Notes

Both changes are host-side / warmup-only and produce byte-identical kernel behavior. CPU tests pass:

  • test_steering_op.py, test_block_steering.py, test_steering_monitor_op.py, test_steering_monitor.py, test_steering_row_monitor.py, test_steering_warmup.py — 57 passed, 5 skipped (CUDA-only).

GPU confirmation of the warmup cache-idempotency assertions is pending (those parts skip without CUDA).

Sibling-PR conflicts: none expected on steering.py / steering_kernel.py. chore/steering-row-owner also touches steering_model_runner_mixin.py; the one-line caller change here may conflict trivially.

@RhizoNymph RhizoNymph merged commit d3258d4 into feat/dynamic-steering Jul 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant