Skip to content

Port the ARM perf levers (cmp→select, local-promotion, imm-shift-fold) to the RISC-V backend — RV dissolved code is now behind ARM (2.12× vs 1.66× on silicon) #472

Description

@avrabe

RISC-V backend lacks the ARM perf levers — dissolved code is now behind ARM (measured on ESP32-C3 silicon)

The VCR-SEL/VCR-RA levers that landed for ARM (cmp→select fusion v0.13.0, i32 local-promotion v0.14.0, immediate-shift folding SYNTH_IMM_SHIFT_FOLD) all live in synth-backend/src/arm_backend.rs. They are not in synth-backend-riscv, so RISC-V dissolved code gets none of them.

Evidence (gust_mix, the synth#428 fixture), synth 0.14.0:

  • synth compile … -b riscv --target esp32c3 output is byte-identical with vs without each flag — SYNTH_IMM_SHIFT_FOLD=1, SYNTH_NO_LOCAL_PROMOTE=1, SYNTH_NO_CMP_SELECT_FUSE=1 all produce the same 164 B .o. The flags are ARM-only no-ops on RV32.
  • grep SYNTH_ synth-backend-riscv/src → no perf-lever env flags exist.
  • v0.13.0 → v0.14.0 left the RISC-V gust_mix unchanged (164 B both).

Impact (real silicon): on the M4 (G474RE, DWT) those four levers took dissolved gust_mix 64.0 → 48.0 cyc/call (2.21× → 1.66× vs LLVM). On the ESP32-C3 (RISC-V RV32IMC, systimer) dissolved gust_mix is still 2.12× — it has the spill/reloads, register-shifts, and materialized-bool clamps the ARM levers eliminated. So RISC-V dissolved codegen is now the lagging backend.

Ask: port the three levers to synth-backend-riscv — they map cleanly to RV32:

  • cmp→select → branchless slt/seqz-fed conditional move pattern (or the Zicond czero.* if targeting it; RV32IMC has no cmov, so the materialized-bool→branch idiom is the target).
  • local i32 promotion → keep eligible locals in s1–s11 (callee-saved) instead of stack slots — same shape as the ARM r4–r8 promotion.
  • immediate-shift folding → slli/srli/srai rd,rs,#C instead of li rM,#C; sll rd,rs,rM.

Validation is ready: gale's ESP32-C3 silicon harness (benches/gust/esp32c3, RISC-V mcycle-less systimer ratio) + the qemu_riscv32 path can run the same flag-off-vs-on DWT-style gate the ARM levers cleared. Happy to re-measure each on the ESP32-C3 as they land — same protocol as the M4 runs in #428.

Refs: synth#428 (the ARM lever tracker + the M4 silicon results); gale benches/gust/esp32c3/RESULTS.md.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions