docs(vcr-ra): RISC-V lever-parity scoping spike — map ARM perf levers to RV32 (#472, #242)#484
Merged
Merged
Conversation
… to RV32 (#472, #242) Frozen-safe scoping spike (no codegen change) for the RISC-V lever port. Reads the RV32IMAC backend source and measures the per-function overhead, mapping each ARM perf lever to its RV32 status: - cmp→select: N/A for RV32IMAC — no conditional-move (Zicond not in IMAC, no predication); `lower_select` is already the minimal branchy form. - local-promotion: APPLIES (direct #390 analogue) — non-param i32 locals are always frame-spilled (sw/lw off(sp)); port to s-register homing, leaf-only, carrying the #474 promotion-exhaustion fallback from the start. - immediate-shift-fold: APPLIES (RV form) — const shift amounts use the register sll/sra/srl (li tmp,N; sll); fold to slli/srli/srai (the ops already exist). - const-address-fold: APPLIES (RISC-V-specific) — RV already holds the linmem base in s11 (no base re-materialization, so #468's base-hoist half is N/A), but const lw/sw addresses do `li addr; add tmp,s11,addr; lw/sw off(tmp)` instead of folding to `lw/sw (ADDR+off)(s11)`. Scope-changing finding: the port is 2 levers + 1 RISC-V-specific fold, not a 1:1 port of all three named ARM levers (cmp→select does not apply to RV32IMAC). Measured .text (RV32 vs ARM): redundant_base 120B/30insn (const-addr-fold headroom ~56B), leaf_caller_saved 104B (local-promotion), shifts 44B (imm-shift-fold ~8B). Lays out the gated per-lever implementation plan (each flag-off → RV32 differential → qemu_riscv32/ESP32-C3 cycle gate → flip) and notes the oracle gap: the RV32 path has no cargo byte-gate and no local RISC-V disassembler, so the differential needs an RV32 execution harness + a small instruction decoder, built as part of step 1. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
This was referenced Jun 25, 2026
avrabe
added a commit
that referenced
this pull request
Jun 25, 2026
…#472, #242) (#486) Traceability sync — the VCR-* roadmap's update logs had drifted behind the shipped RISC-V lever-port prep. Records the RV32 lever-baseline slice (#472/#484/#485) under VCR-ORACLE-001, its accurate home (it already logs the RV32 oracle slices: the frozen-fixture byte gate and the cmp-select execution differential). The entry captures: the three `*_baseline_472` selector tests pinning the current pre-lever RV32 codegen at the RiscVOp-stream level (const-address store unfolded, register-form shift, frame-spilled local), green today and flipping when each lever lands default-on so a codegen change on the un-byte-gated RV32 path surfaces as a reviewed assertion update; and the scoping finding that reshaped the port — cmp->select is N/A for RV32IMAC (no conditional-move), so it is local-promotion + immediate-shift-fold + a RISC-V-specific const-address-fold, not a 1:1 port. Frozen-safe: a single description append + a `riscv` tag on an existing item; no status change, no new links. rivet validate clean (0 non-cross-repo errors under the CI gate). The ARM perf levers' roadmap reconciliation is deliberately left to a focused pass rather than slotted into ambiguous homes here. Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
avrabe
added a commit
that referenced
this pull request
Jun 25, 2026
…i/srli/srai, flag-off (#472, #242) (#487) Port the first applicable ARM perf lever to the RV32 backend (scoped in #484). A constant shift `i32.shl/shr_u/shr_s (val) (i32.const N)` lowers as `addi tmp,zero,N ; sll/srl/sra rd,val,tmp` — the amount is materialized into a register, then consumed by the register-form shift. RV32 has immediate shift forms `slli/srli/srai` carrying the amount in the instruction, so folding a constant amount drops the `addi` (one instruction per constant shift). `fold_const_shift` is a post-pass peephole (mirrors the ARM `fold_immediate_shifts` / `fold_uxth` scaffolding): for each `addi tmp,zero,N`, the windowed scan finds the consuming register shift and rewrites it to the immediate form, dropping the `addi` as a dead store. Soundness: * `rs1 != tmp` guard — dropping the `addi` must not remove the shift's input definition; * the `addi` is removed only when it is a dead store — either the fold's destination IS `tmp` (the `slli` redefines it, reading only `rs1`) or `tmp` is dead after the shift (`rv_reg_dead_after`, the RV32 analogue of the ARM `reg_dead_by_redef`; an unmodeled op ⇒ can't-prove ⇒ keep); * `shamt = N & 31` reproduces the register `sll`'s hardware low-5-bit mask = WASM's shift-mod-32, so amounts ≥ 32 and negative constants fold identically. Only the single-`addi` const form (N in -2048..=2047, covering every meaningful amount 0..31) folds; a large constant via `lui+addi` stays a register shift. Flag-off behind `SYNTH_RV_SHIFT_FOLD` (default off): with the env unset the output is byte-identical to the pre-lever baseline, so the frozen RV32 fixtures (control_step / signed_div_const) are unchanged — frozen-safe by construction. The on-target cycle win is validated before the default-on flip. Oracle (scripts/repro/shift_fold.wat + shift_fold_riscv_differential.py): every exported function runs under unicorn UC_ARCH_RISCV in both flag states and matches wasmtime — including the mask cases (`shl33` << 33→1, `shlneg` << -1→31) and a VARIABLE shift (`shlvar`) that must NOT fold. Non-vacuity: flag-on `.text` 168B→148B (−20B = exactly 5 const shifts folded); flag-off zero. 6 unit tests cover fold/decline (input-alias guard, live-after, dest==tmp), srl/sra, and the mask. Full RV32 suite (184) + frozen byte gate (ARM+RV32) green; fmt + clippy clean. Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
#472 scoping spike — frozen-safe (no codegen change)
Maps each ARM perf lever (landed v0.13–v0.15) to its RV32IMAC status, source-grounded in
synth-backend-riscv/src/selector.rsand measured by.textsize. The byte-changing ports are the explicitly-separate next gated steps.Scope-changing finding
The port is 2 levers + 1 RISC-V-specific fold, not a 1:1 port of the three named ARM levers:
lower_selectis already the minimal branchy form.sw/lw off(sp)); port to s-register homing, leaf-only, carrying the #474 fallback from the start.sll/sra/srl(li tmp,N; sll); fold toslli/srli/srai(ops already exist).s11(no #468-style re-materialization), but constlw/swaddresses doli addr; add tmp,s11,addr; lw/sw off(tmp)instead of folding tolw/sw (ADDR+off)(s11).Measured
.text(RV32 vs ARM)redundant_base_materializationleaf_caller_savedshiftsGated plan
Per-lever PRs (imm-shift-fold → const-address-fold → local-promotion), each flag-off → RV32 differential → qemu_riscv32/ESP32-C3 cycle gate → flip. Oracle gap noted: the RV32 path has no cargo byte-gate and no local RISC-V disassembler — the differential needs an RV32 execution harness + a small instruction decoder, built as part of step 1.
This is the frozen-safe measure-before-optimize scoping for #472's implementation, mirroring the #468 scoping doc that proved load-bearing.
🤖 Generated with Claude Code