Skip to content

feat(vcr-ra): dead-frame elimination for promoted-local leaves, flag-off (#390, #242)#481

Merged
avrabe merged 1 commit into
mainfrom
vcr-ra/002-leaf-prologue
Jun 25, 2026
Merged

feat(vcr-ra): dead-frame elimination for promoted-local leaves, flag-off (#390, #242)#481
avrabe merged 1 commit into
mainfrom
vcr-ra/002-leaf-prologue

Conversation

@avrabe

@avrabe avrabe commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

VCR-RA-002 — lever 2 of the perf feature loop (uxth → 002#468#472)

compute_local_layout reserves a frame slot (sub sp,#N / add sp,#N) for every non-param wasm local it sees. Local promotion (v0.14.0) then homes the eligible i32 locals in registers, so when a function's locals all promote — and it neither spills, calls, nor touches i64 / stack-passed params — those frame bytes are never accessed. The sub/add sp pair is pure overhead (~2-3 cyc on a small leaf), AND because it writes SP it makes shrink_callee_saved_saves decline (that pass bails on any SP def/use).

elide_dead_frame removes the pair when the body provably never touches SP — saving the two instructions and restoring the SP-untouched precondition the shrink pass needs, so the two passes compose. It runs before shrink in the relocatable pipeline.

Safe-by-construction

Fires only when no instruction reads/writes SP except the matched frame sub/add and the prologue Push / epilogue Pop. For wasm locals that guard is exact deadness — locals are not addressable, so every other SP consumer (spills, #204 param-backing, the i64 pair-spill area, the #359 outgoing-arg region, incoming stack params) manifests as an [sp,#off] access the guard sees. Any such access — or any unmodeled op whose SP effect can't be confirmed absent — declines and leaves the bytes unchanged. Removal-only: no instruction added, rewritten, or reordered.

Gating

  • Flag-off (opt-in SYNTH_DEAD_FRAME_ELIM=1); default path byte-identical → frozen byte gate green.
  • Default-on flip held for on-silicon validation, like the realloc/shrink levers.

Validation

  • 6 unit tests — removes / declines on sp-relative access / unbalanced add-sp / unmodeled sp-effect / no-frame noop / multiple epilogues.
  • leaf_dead_frame_differential.pyleaf3 under unicorn, flag-off == flag-on == wasmtime over 10 vectors (signed + i32-wrap edges); 36 B → 28 B (-8 B). Both builds return cleanly via popped LR, confirming SP balance.

Honest scope

The push stays {r4-r8,lr} on leaf3 — a,b,c land in callee-saved r4,r5,r6 + scratch r7 = 4 saved regs, which shrink pads back to the even-count {r4-r8,lr}. Trimming the push needs the locals out of callee-saved (caller-saved leaf homing), tracked separately as #390.

🤖 Generated with Claude Code

…off (#390, #242)

`compute_local_layout` reserves a frame slot (`sub sp,#N` / `add sp,#N`) for
every non-param wasm local it sees. Local promotion (v0.14.0) then homes the
eligible i32 locals in registers, so for a function whose locals all promote —
and which neither spills, calls, nor touches i64 / stack-passed params — those
frame bytes are never accessed. The `sub`/`add sp` pair is then pure overhead
(~2-3 cyc on a small leaf), AND because it writes SP it makes
`shrink_callee_saved_saves` decline (that pass bails on any SP def/use).

`elide_dead_frame` removes the pair when the body provably never touches SP,
saving the two instructions and restoring the SP-untouched precondition the
shrink pass needs — so the two passes compose. It runs BEFORE shrink in the
relocatable pipeline.

Safe-by-construction: fires only when NO instruction reads/writes SP except the
matched frame sub/add and the prologue Push / epilogue Pop. For wasm locals that
guard is exact deadness — locals are not addressable, so every other SP consumer
(spills, #204 param-backing, the i64 pair-spill area, the #359 outgoing-arg
region, incoming stack params) manifests as an `[sp,#off]` access the guard
sees. Any such access, or any unmodeled op whose SP effect can't be confirmed
absent, declines and leaves the bytes unchanged. Removal-only: no instruction is
added, rewritten, or reordered.

Flag-off (opt-in `SYNTH_DEAD_FRAME_ELIM=1`); default path byte-identical — the
frozen byte gate stays green. Default-on flip held for on-silicon validation,
like the realloc/shrink levers.

Validation:
- 6 unit tests (removes / declines on sp-relative / unbalanced add-sp /
  unmodeled sp-effect / no-frame noop / multiple epilogues).
- leaf_dead_frame_differential.py: leaf3 under unicorn, flag-off==flag-on==
  wasmtime over 10 vectors (signed + i32-wrap edges); 36 B -> 28 B (-8 B). Both
  builds return cleanly via popped LR, confirming SP balance.

NOTE: the push stays {r4-r8,lr} here — a,b,c land in callee-saved r4,r5,r6 +
scratch r7 = 4 saved regs, which shrink pads back to the even-count {r4-r8,lr}.
Trimming the push needs the locals OUT of callee-saved (caller-saved leaf
homing), tracked separately as #390.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@codecov

codecov Bot commented Jun 25, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 91.76471% with 14 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
crates/synth-synthesis/src/liveness.rs 92.21% 13 Missing ⚠️
crates/synth-backend/src/arm_backend.rs 66.66% 1 Missing ⚠️

📢 Thoughts on this report? Let us know!

@avrabe avrabe merged commit cd4af64 into main Jun 25, 2026
15 checks passed
@avrabe avrabe deleted the vcr-ra/002-leaf-prologue branch June 25, 2026 05:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant