RISC-V backend lacks the ARM perf levers — dissolved code is now behind ARM (measured on ESP32-C3 silicon)
The VCR-SEL/VCR-RA levers that landed for ARM (cmp→select fusion v0.13.0, i32 local-promotion v0.14.0, immediate-shift folding SYNTH_IMM_SHIFT_FOLD) all live in synth-backend/src/arm_backend.rs. They are not in synth-backend-riscv, so RISC-V dissolved code gets none of them.
Evidence (gust_mix, the synth#428 fixture), synth 0.14.0:
synth compile … -b riscv --target esp32c3 output is byte-identical with vs without each flag — SYNTH_IMM_SHIFT_FOLD=1, SYNTH_NO_LOCAL_PROMOTE=1, SYNTH_NO_CMP_SELECT_FUSE=1 all produce the same 164 B .o. The flags are ARM-only no-ops on RV32.
grep SYNTH_ synth-backend-riscv/src → no perf-lever env flags exist.
- v0.13.0 → v0.14.0 left the RISC-V
gust_mix unchanged (164 B both).
Impact (real silicon): on the M4 (G474RE, DWT) those four levers took dissolved gust_mix 64.0 → 48.0 cyc/call (2.21× → 1.66× vs LLVM). On the ESP32-C3 (RISC-V RV32IMC, systimer) dissolved gust_mix is still 2.12× — it has the spill/reloads, register-shifts, and materialized-bool clamps the ARM levers eliminated. So RISC-V dissolved codegen is now the lagging backend.
Ask: port the three levers to synth-backend-riscv — they map cleanly to RV32:
- cmp→select → branchless
slt/seqz-fed conditional move pattern (or the Zicond czero.* if targeting it; RV32IMC has no cmov, so the materialized-bool→branch idiom is the target).
- local i32 promotion → keep eligible locals in
s1–s11 (callee-saved) instead of stack slots — same shape as the ARM r4–r8 promotion.
- immediate-shift folding →
slli/srli/srai rd,rs,#C instead of li rM,#C; sll rd,rs,rM.
Validation is ready: gale's ESP32-C3 silicon harness (benches/gust/esp32c3, RISC-V mcycle-less systimer ratio) + the qemu_riscv32 path can run the same flag-off-vs-on DWT-style gate the ARM levers cleared. Happy to re-measure each on the ESP32-C3 as they land — same protocol as the M4 runs in #428.
Refs: synth#428 (the ARM lever tracker + the M4 silicon results); gale benches/gust/esp32c3/RESULTS.md.
RISC-V backend lacks the ARM perf levers — dissolved code is now behind ARM (measured on ESP32-C3 silicon)
The VCR-SEL/VCR-RA levers that landed for ARM (cmp→select fusion v0.13.0, i32 local-promotion v0.14.0, immediate-shift folding
SYNTH_IMM_SHIFT_FOLD) all live insynth-backend/src/arm_backend.rs. They are not insynth-backend-riscv, so RISC-V dissolved code gets none of them.Evidence (gust_mix, the synth#428 fixture), synth 0.14.0:
synth compile … -b riscv --target esp32c3output is byte-identical with vs without each flag —SYNTH_IMM_SHIFT_FOLD=1,SYNTH_NO_LOCAL_PROMOTE=1,SYNTH_NO_CMP_SELECT_FUSE=1all produce the same 164 B.o. The flags are ARM-only no-ops on RV32.grep SYNTH_ synth-backend-riscv/src→ no perf-lever env flags exist.gust_mixunchanged (164 B both).Impact (real silicon): on the M4 (G474RE, DWT) those four levers took dissolved
gust_mix64.0 → 48.0 cyc/call (2.21× → 1.66× vs LLVM). On the ESP32-C3 (RISC-V RV32IMC, systimer) dissolvedgust_mixis still 2.12× — it has the spill/reloads, register-shifts, and materialized-bool clamps the ARM levers eliminated. So RISC-V dissolved codegen is now the lagging backend.Ask: port the three levers to
synth-backend-riscv— they map cleanly to RV32:slt/seqz-fed conditional move pattern (or the Zicondczero.*if targeting it; RV32IMC has no cmov, so the materialized-bool→branch idiom is the target).s1–s11(callee-saved) instead of stack slots — same shape as the ARM r4–r8 promotion.slli/srli/srai rd,rs,#Cinstead ofli rM,#C; sll rd,rs,rM.Validation is ready: gale's ESP32-C3 silicon harness (
benches/gust/esp32c3, RISC-Vmcycle-less systimer ratio) + the qemu_riscv32 path can run the same flag-off-vs-on DWT-style gate the ARM levers cleared. Happy to re-measure each on the ESP32-C3 as they land — same protocol as the M4 runs in #428.Refs: synth#428 (the ARM lever tracker + the M4 silicon results); gale
benches/gust/esp32c3/RESULTS.md.