Port the ARM perf levers (cmp→select, local-promotion, imm-shift-fold) to the RISC-V backend — RV dissolved code is now behind ARM (2.12× vs 1.66× on silicon)

## RISC-V backend lacks the ARM perf levers — dissolved code is now behind ARM (measured on ESP32-C3 silicon)

The VCR-SEL/VCR-RA levers that landed for ARM (cmp→select fusion v0.13.0, i32 local-promotion v0.14.0, immediate-shift folding `SYNTH_IMM_SHIFT_FOLD`) all live in `synth-backend/src/arm_backend.rs`. They are **not** in `synth-backend-riscv`, so RISC-V dissolved code gets none of them.

**Evidence (gust_mix, the synth#428 fixture), synth 0.14.0:**
- `synth compile … -b riscv --target esp32c3` output is **byte-identical** with vs without each flag — `SYNTH_IMM_SHIFT_FOLD=1`, `SYNTH_NO_LOCAL_PROMOTE=1`, `SYNTH_NO_CMP_SELECT_FUSE=1` all produce the same 164 B `.o`. The flags are ARM-only no-ops on RV32.
- `grep SYNTH_ synth-backend-riscv/src` → no perf-lever env flags exist.
- v0.13.0 → v0.14.0 left the RISC-V `gust_mix` unchanged (164 B both).

**Impact (real silicon):** on the M4 (G474RE, DWT) those four levers took dissolved `gust_mix` **64.0 → 48.0 cyc/call (2.21× → 1.66× vs LLVM)**. On the ESP32-C3 (RISC-V RV32IMC, systimer) dissolved `gust_mix` is still **2.12×** — it has the spill/reloads, register-shifts, and materialized-bool clamps the ARM levers eliminated. So RISC-V dissolved codegen is now the lagging backend.

**Ask:** port the three levers to `synth-backend-riscv` — they map cleanly to RV32:
- cmp→select → branchless `slt`/`seqz`-fed conditional move pattern (or the Zicond `czero.*` if targeting it; RV32IMC has no cmov, so the materialized-bool→branch idiom is the target).
- local i32 promotion → keep eligible locals in `s1–s11` (callee-saved) instead of stack slots — same shape as the ARM r4–r8 promotion.
- immediate-shift folding → `slli/srli/srai rd,rs,#C` instead of `li rM,#C; sll rd,rs,rM`.

Validation is ready: gale's ESP32-C3 silicon harness (`benches/gust/esp32c3`, RISC-V `mcycle`-less systimer ratio) + the qemu_riscv32 path can run the same flag-off-vs-on DWT-style gate the ARM levers cleared. Happy to re-measure each on the ESP32-C3 as they land — same protocol as the M4 runs in #428.

Refs: synth#428 (the ARM lever tracker + the M4 silicon results); gale `benches/gust/esp32c3/RESULTS.md`.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Port the ARM perf levers (cmp→select, local-promotion, imm-shift-fold) to the RISC-V backend — RV dissolved code is now behind ARM (2.12× vs 1.66× on silicon) #472

RISC-V backend lacks the ARM perf levers — dissolved code is now behind ARM (measured on ESP32-C3 silicon)

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Port the ARM perf levers (cmp→select, local-promotion, imm-shift-fold) to the RISC-V backend — RV dissolved code is now behind ARM (2.12× vs 1.66× on silicon) #472

Description

RISC-V backend lacks the ARM perf levers — dissolved code is now behind ARM (measured on ESP32-C3 silicon)

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions