Skip to content

docs(vcr-ra): RISC-V lever-parity scoping spike — map ARM perf levers to RV32 (#472, #242)#484

Merged
avrabe merged 1 commit into
mainfrom
vcr-ra/472-riscv-scoping
Jun 25, 2026
Merged

docs(vcr-ra): RISC-V lever-parity scoping spike — map ARM perf levers to RV32 (#472, #242)#484
avrabe merged 1 commit into
mainfrom
vcr-ra/472-riscv-scoping

Conversation

@avrabe

@avrabe avrabe commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

#472 scoping spike — frozen-safe (no codegen change)

Maps each ARM perf lever (landed v0.13–v0.15) to its RV32IMAC status, source-grounded in synth-backend-riscv/src/selector.rs and measured by .text size. The byte-changing ports are the explicitly-separate next gated steps.

Scope-changing finding

The port is 2 levers + 1 RISC-V-specific fold, not a 1:1 port of the three named ARM levers:

ARM lever RV32 status
cmp→select N/A for RV32IMAC — no conditional-move (Zicond not in IMAC, no IT-predication); lower_select is already the minimal branchy form.
local-promotion APPLIES (direct #390 analogue) — non-param i32 locals always frame-spilled (sw/lw off(sp)); port to s-register homing, leaf-only, carrying the #474 fallback from the start.
immediate-shift-fold APPLIES (RV form) — const shift amounts use register sll/sra/srl (li tmp,N; sll); fold to slli/srli/srai (ops already exist).
const-address-fold APPLIES (RISC-V-specific) — RV already holds the base in s11 (no #468-style re-materialization), but const lw/sw addresses do li addr; add tmp,s11,addr; lw/sw off(tmp) instead of folding to lw/sw (ADDR+off)(s11).

Measured .text (RV32 vs ARM)

fixture ARM RV32 headroom
redundant_base_materialization 336 B 120 B (30 insn) const-addr-fold ~56 B
leaf_caller_saved 200 B 104 B local-promotion (sw/lw traffic)
shifts 188 B 44 B imm-shift-fold ~8 B

Gated plan

Per-lever PRs (imm-shift-fold → const-address-fold → local-promotion), each flag-off → RV32 differential → qemu_riscv32/ESP32-C3 cycle gate → flip. Oracle gap noted: the RV32 path has no cargo byte-gate and no local RISC-V disassembler — the differential needs an RV32 execution harness + a small instruction decoder, built as part of step 1.

This is the frozen-safe measure-before-optimize scoping for #472's implementation, mirroring the #468 scoping doc that proved load-bearing.

🤖 Generated with Claude Code

… to RV32 (#472, #242)

Frozen-safe scoping spike (no codegen change) for the RISC-V lever port. Reads the
RV32IMAC backend source and measures the per-function overhead, mapping each ARM
perf lever to its RV32 status:

- cmp→select: N/A for RV32IMAC — no conditional-move (Zicond not in IMAC, no
  predication); `lower_select` is already the minimal branchy form.
- local-promotion: APPLIES (direct #390 analogue) — non-param i32 locals are
  always frame-spilled (sw/lw off(sp)); port to s-register homing, leaf-only,
  carrying the #474 promotion-exhaustion fallback from the start.
- immediate-shift-fold: APPLIES (RV form) — const shift amounts use the register
  sll/sra/srl (li tmp,N; sll); fold to slli/srli/srai (the ops already exist).
- const-address-fold: APPLIES (RISC-V-specific) — RV already holds the linmem base
  in s11 (no base re-materialization, so #468's base-hoist half is N/A), but const
  lw/sw addresses do `li addr; add tmp,s11,addr; lw/sw off(tmp)` instead of folding
  to `lw/sw (ADDR+off)(s11)`.

Scope-changing finding: the port is 2 levers + 1 RISC-V-specific fold, not a 1:1
port of all three named ARM levers (cmp→select does not apply to RV32IMAC).

Measured .text (RV32 vs ARM): redundant_base 120B/30insn (const-addr-fold headroom
~56B), leaf_caller_saved 104B (local-promotion), shifts 44B (imm-shift-fold ~8B).

Lays out the gated per-lever implementation plan (each flag-off → RV32 differential
→ qemu_riscv32/ESP32-C3 cycle gate → flip) and notes the oracle gap: the RV32 path
has no cargo byte-gate and no local RISC-V disassembler, so the differential needs
an RV32 execution harness + a small instruction decoder, built as part of step 1.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@avrabe avrabe merged commit ce50642 into main Jun 25, 2026
10 checks passed
@avrabe avrabe deleted the vcr-ra/472-riscv-scoping branch June 25, 2026 07:51
@codecov

codecov Bot commented Jun 25, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

avrabe added a commit that referenced this pull request Jun 25, 2026
…#472, #242) (#486)

Traceability sync — the VCR-* roadmap's update logs had drifted behind the shipped
RISC-V lever-port prep. Records the RV32 lever-baseline slice (#472/#484/#485) under
VCR-ORACLE-001, its accurate home (it already logs the RV32 oracle slices: the
frozen-fixture byte gate and the cmp-select execution differential).

The entry captures: the three `*_baseline_472` selector tests pinning the current
pre-lever RV32 codegen at the RiscVOp-stream level (const-address store unfolded,
register-form shift, frame-spilled local), green today and flipping when each lever
lands default-on so a codegen change on the un-byte-gated RV32 path surfaces as a
reviewed assertion update; and the scoping finding that reshaped the port —
cmp->select is N/A for RV32IMAC (no conditional-move), so it is local-promotion +
immediate-shift-fold + a RISC-V-specific const-address-fold, not a 1:1 port.

Frozen-safe: a single description append + a `riscv` tag on an existing item; no
status change, no new links. rivet validate clean (0 non-cross-repo errors under the
CI gate). The ARM perf levers' roadmap reconciliation is deliberately left to a
focused pass rather than slotted into ambiguous homes here.

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
avrabe added a commit that referenced this pull request Jun 25, 2026
…i/srli/srai, flag-off (#472, #242) (#487)

Port the first applicable ARM perf lever to the RV32 backend (scoped in #484).
A constant shift `i32.shl/shr_u/shr_s (val) (i32.const N)` lowers as
`addi tmp,zero,N ; sll/srl/sra rd,val,tmp` — the amount is materialized into a
register, then consumed by the register-form shift. RV32 has immediate shift
forms `slli/srli/srai` carrying the amount in the instruction, so folding a
constant amount drops the `addi` (one instruction per constant shift).

`fold_const_shift` is a post-pass peephole (mirrors the ARM `fold_immediate_shifts`
/ `fold_uxth` scaffolding): for each `addi tmp,zero,N`, the windowed scan finds
the consuming register shift and rewrites it to the immediate form, dropping the
`addi` as a dead store. Soundness:
  * `rs1 != tmp` guard — dropping the `addi` must not remove the shift's input
    definition;
  * the `addi` is removed only when it is a dead store — either the fold's
    destination IS `tmp` (the `slli` redefines it, reading only `rs1`) or `tmp`
    is dead after the shift (`rv_reg_dead_after`, the RV32 analogue of the ARM
    `reg_dead_by_redef`; an unmodeled op ⇒ can't-prove ⇒ keep);
  * `shamt = N & 31` reproduces the register `sll`'s hardware low-5-bit mask =
    WASM's shift-mod-32, so amounts ≥ 32 and negative constants fold identically.

Only the single-`addi` const form (N in -2048..=2047, covering every meaningful
amount 0..31) folds; a large constant via `lui+addi` stays a register shift.

Flag-off behind `SYNTH_RV_SHIFT_FOLD` (default off): with the env unset the
output is byte-identical to the pre-lever baseline, so the frozen RV32 fixtures
(control_step / signed_div_const) are unchanged — frozen-safe by construction.
The on-target cycle win is validated before the default-on flip.

Oracle (scripts/repro/shift_fold.wat + shift_fold_riscv_differential.py): every
exported function runs under unicorn UC_ARCH_RISCV in both flag states and
matches wasmtime — including the mask cases (`shl33` << 33→1, `shlneg` << -1→31)
and a VARIABLE shift (`shlvar`) that must NOT fold. Non-vacuity: flag-on `.text`
168B→148B (−20B = exactly 5 const shifts folded); flag-off zero. 6 unit tests
cover fold/decline (input-alias guard, live-after, dest==tmp), srl/sra, and the
mask. Full RV32 suite (184) + frozen byte gate (ARM+RV32) green; fmt + clippy
clean.

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant