Skip to content

feat/benchmark_new_version#65

Merged
Barnadrot merged 2 commits into
mainfrom
pw4_2-2026-05-13
May 14, 2026
Merged

feat/benchmark_new_version#65
Barnadrot merged 2 commits into
mainfrom
pw4_2-2026-05-13

Conversation

@Barnadrot
Copy link
Copy Markdown
Owner

Redesigned benchmarking to improve experiment gate.

pw4_2 agent and others added 2 commits May 14, 2026 10:21
Session result: 1 confirmed real win (h5 orphan, -0.83% wall-clock,
p=8e-06, n=24, reproduced n=40), 9 discards, 1 dead-end (D.2 already
applied). Counter at 9/12 — ended early because all further attempts
on the SIMD Poseidon surface produced LLVM-neutralized null results
and the h5 orphan is shippable on its own.

h5 mechanism: manual unroll of the 15-iter rank-1 update in partial
round SIMD body. Eliminated the 0x780(%rsp,%rsi,1) indexed-stack spill
that perf annotate showed at 2.28%+1.31%=3.6% on baseline.

Codegen verified: vmovdqa64 0x780(%rsp,%rsi,1) instance count went
from 2 (baseline) to 0 (h5 binary).

Held on branch as orphan (commit 943b65a1 on leanMultisig:pw4_2-2026-05-13).
Recommend brain manually promote to shipped — the gain is real,
statistically robust, codegen-verified, and free. Bundle attempts
h6-h11 all failed to push the cumulative over the 1.0% gate threshold,
but h5 alone is worth shipping.

Verdict + pr_body include: full hypothesis table, confirmed dead-ends
(loop unrolls outside h5 case, SSA-hoist, lambda fusion, x2 batching,
prefetch, rayon pairing — all LLVM-neutralized), and recommendation
for follow-on profiling post-h5.
Post pw4 audit: baseline drift inflated per-iter deltas by 5-10x due to
position bias, low sample count, and page cache asymmetry. This commit
addresses all identified noise sources:

- N=1→N=3 default (12 samples/side, ~85% power vs ~40%)
- Counterbalanced round ordering (odd=base→cand, even=cand→base)
- Core pinning via taskset (one thread per physical core)
- Page cache drop between sides (sync + drop_caches)
- A-vs-A noise floor calibration (soft-fail, no loop disruption)
- New eval_steady_state.sh for production-regime measurement (proof 200+)
- Auto-chain steady-state on keeps exceeding 2%

Tested on Hetzner AX42-U: noise floor 0.035%, counterbalancing visibly
corrects position bias (round 2 Δ=-0.23% vs rounds 1,3 Δ=-0.48%).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@Barnadrot Barnadrot merged commit f76ce46 into main May 14, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant