You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For struct-heavy code, codegen emits a lot of avoidable linear-memory traffic. Three patterns, all visible in the simplest helpers:
Read-only struct parameters are defensively memory.copy'd into the callee frame on entry, even when the function never mutates them.
The whole frame is memory.fill-zeroed in the prologue, then immediately fully overwritten by those copies — a dead fill.
Small fixed structs are never scalarized (no SROA). A 24-byte Vec3 always lives in linear memory and is accessed via load/store; leaf readers allocate a shadow-stack frame they would not otherwise need.
Evidence — dot(a: Vec3, b: Vec3) -> i64 (a pure reader)
(func$dot (param$ai32) (param$bi32) (resulti64)
(local$fi64) (local$__frame_ptri32)
global.get0i32.const48i32.sublocal.tee$__frame_ptrglobal.set0local.get$__frame_ptri32.const0i32.const48memory.fill;; (2) zero whole frame…local.get$__frame_ptrlocal.get$ai32.const24memory.copy;; (1) …then copy param a inlocal.get$__frame_ptrlocal.set$alocal.get$__frame_ptri32.const24i32.addlocal.get$bi32.const24memory.copy;; …and b
...
;; only now: 6 i64.load + 2 i64.mul + 2 i64.add + i64.shr_s
)
dot only reads a/b, yet it: allocates a 48-byte frame, zero-fills it, and copies both 24-byte params into it (the fill is fully clobbered by the two copies). With copies elided it would need no frame at all and could load straight from the caller pointers.
Module-wide
For out/main.wasm (the ray tracer): 119 memory.copy, 26 memory.fill, 1492 local.get, versus ~12 i64.mul / 2 i64.div_s of "real" arithmetic (most math is delegated to the linked fixmath). Hot-path examples: let sph: Sphere = scene[i] copies 80 bytes per sphere per bounce; every Vec3 helper copies its parameters. All correctness-neutral — output is correct — but it's the dominant cost for this workload.
Suggested directions
A simple mutation/escape analysis: pass non-mutated struct params by reference (skip the entry copy). Value semantics only require a copy when the callee mutates the param.
Skip the prologue memory.fill for frame regions that are fully overwritten before any read (here, the param slots). (Related to Redundant zero-initialization in array literal codegen #188, which fixed the all-zero array-literal case; this is a different trigger — param copies rather than literal stores.)
SROA for small fixed-size structs: keep Vec3-sized aggregates in i64 locals instead of linear memory, eliminating frame setup + load/store for temporaries.
Related: #188 (redundant zero-init in array-literal codegen, fixed).
Environment:infc 0.0.1 (commit df45600, ABI 1.0), Inferara/inference @ d82d935 (wasm-linker). Found while implementing a fixed-point ray tracer entirely in Inference (scratch/raytracing-in-one-weekend). WAT produced via wasmprinter.
Summary
For struct-heavy code, codegen emits a lot of avoidable linear-memory traffic. Three patterns, all visible in the simplest helpers:
memory.copy'd into the callee frame on entry, even when the function never mutates them.memory.fill-zeroed in the prologue, then immediately fully overwritten by those copies — a dead fill.Vec3always lives in linear memory and is accessed via load/store; leaf readers allocate a shadow-stack frame they would not otherwise need.Evidence —
dot(a: Vec3, b: Vec3) -> i64(a pure reader)dotonly readsa/b, yet it: allocates a 48-byte frame, zero-fills it, and copies both 24-byte params into it (the fill is fully clobbered by the two copies). With copies elided it would need no frame at all and could load straight from the caller pointers.Module-wide
For
out/main.wasm(the ray tracer): 119memory.copy, 26memory.fill, 1492local.get, versus ~12i64.mul/ 2i64.div_sof "real" arithmetic (most math is delegated to the linkedfixmath). Hot-path examples:let sph: Sphere = scene[i]copies 80 bytes per sphere per bounce; everyVec3helper copies its parameters. All correctness-neutral — output is correct — but it's the dominant cost for this workload.Suggested directions
memory.fillfor frame regions that are fully overwritten before any read (here, the param slots). (Related to Redundant zero-initialization in array literal codegen #188, which fixed the all-zero array-literal case; this is a different trigger — param copies rather than literal stores.)Vec3-sized aggregates ini64locals instead of linear memory, eliminating frame setup + load/store for temporaries.Related: #188 (redundant zero-init in array-literal codegen, fixed).
Environment:
infc0.0.1 (commitdf45600, ABI 1.0),Inferara/inference@d82d935(wasm-linker). Found while implementing a fixed-point ray tracer entirely in Inference (scratch/raytracing-in-one-weekend). WAT produced viawasmprinter.