feat(vcr-ra): RV32 const-address-fold — fold constant addr into the access immediate off s11, flag-off (#472, #242)#491
Merged
Conversation
…ccess immediate off s11, flag-off (#472, #242) Loop-4 step 2 of the RISC-V lever-parity port (#472), the RISC-V analogue of the ARM base-CSE address half (#468). A `i32.load/store (i32.const ADDR) …` lowers as `addi a,zero,ADDR; add tmp,s11,a; lw/sw _,off(tmp)`; when `ADDR+off` fits the signed-12-bit access immediate, `fold_const_addr` collapses it to a single `lw/sw _,(ADDR+off)(s11)`, dropping the `add` and the address `addi` — 2 instructions per constant-address access. Post-pass peephole (the structural twin of the #487 shift fold). Soundness: * `ADDR+off` is range-checked as a SUM against [-2048, 2047] (each term is already <=12 bits, so two in-range values can sum out of range); * the `add` base must be s11 and its address operand a `addi a,zero,ADDR` (single-`addi` small constant; a `lui+addi` large address stays the `add` form, out of v1 scope); * 3->1 rewrite, so BOTH dropped temps must be dead — `tmp` (add result) read only by the access, and `a` (address constant) read only by the `add` (rv_reg_dead_after + an untouched-between-def-and-use check); a bounds check between the add and the access reads `a` and disqualifies the fold. Flag-off behind SYNTH_RV_ADDR_FOLD (default off => byte-identical to baseline, so the frozen RV32 fixtures and `const_addr_store_not_folded_baseline_472` (#485) stay green — frozen-safe). The on-target cycle win is validated before the flip. Oracle (scripts/repro/const_addr_fold_riscv_differential.py, reusing the redundant_base_materialization fixture): runs `init_fields` (7 constant-address stores) under unicorn UC_ARCH_RISCV in both flag states; the resulting linear MEMORY is bit-identical to wasmtime. Non-vacuity: .text 120B -> 64B (-56B = 14 instructions, 2 per store). CI-gated as an isolated `rv32-const-addr-fold-oracle` job mirroring the shift-fold oracle. 5 unit tests (store/load fold, offset sum, 12-bit range guard, addr-reused decline). RV32 suite (189) + frozen byte gate (ARM+RV32) green; fmt + clippy clean. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Loop-4 step 2 of the RISC-V lever-parity port (#472) — the RISC-V analogue of the ARM base-CSE address half (#468). A
i32.load/store (i32.const ADDR) …lowers asaddi a,zero,ADDR; add tmp,s11,a; lw/sw _,off(tmp). WhenADDR+offfits the signed-12-bit access immediate,fold_const_addrcollapses it to a singlelw/sw _,(ADDR+off)(s11), dropping theaddand the addressaddi— 2 instructions per constant-address access.How
A post-pass peephole — the structural twin of the #487 shift fold. Soundness:
ADDR+offis range-checked as a SUM against[-2048, 2047](each term is already ≤12 bits, so two in-range values can sum out of range);addbase must bes11and its address operand aaddi a,zero,ADDR(single-addismall constant; alui+addilarge address stays theaddform, out of v1 scope);tmp(add result) read only by the access, anda(address constant) read only by theadd(rv_reg_dead_after+ an untouched-between-def-and-use check); a bounds check between the add and the access readsaand disqualifies the fold.Frozen-safe
Flag-off behind
SYNTH_RV_ADDR_FOLD(default off ⇒ byte-identical to baseline). The frozen RV32 fixtures andconst_addr_store_not_folded_baseline_472(#485, the flip-marker) stay green. The on-target cycle win is validated before the default-on flip.Oracle
scripts/repro/const_addr_fold_riscv_differential.py(reuses theredundant_base_materializationfixture): runsinit_fields(7 constant-address stores) under unicornUC_ARCH_RISCVin both flag states; the resulting linear memory is bit-identical to wasmtime.CI-gated as an isolated
rv32-const-addr-fold-oraclejob (mirrors the shift-fold oracle; emulation deps pip-installed in that job only). Verified locally with the exact CI invocation (debug binary).Tests / gates
fmt+clippyclean.Part of epic #242 (VCR-*), continuing the #472 lever-parity slice after the imm-shift-fold (#487).
🤖 Generated with Claude Code