feat/benchmark_new_version#65
Merged
Merged
Conversation
Session result: 1 confirmed real win (h5 orphan, -0.83% wall-clock, p=8e-06, n=24, reproduced n=40), 9 discards, 1 dead-end (D.2 already applied). Counter at 9/12 — ended early because all further attempts on the SIMD Poseidon surface produced LLVM-neutralized null results and the h5 orphan is shippable on its own. h5 mechanism: manual unroll of the 15-iter rank-1 update in partial round SIMD body. Eliminated the 0x780(%rsp,%rsi,1) indexed-stack spill that perf annotate showed at 2.28%+1.31%=3.6% on baseline. Codegen verified: vmovdqa64 0x780(%rsp,%rsi,1) instance count went from 2 (baseline) to 0 (h5 binary). Held on branch as orphan (commit 943b65a1 on leanMultisig:pw4_2-2026-05-13). Recommend brain manually promote to shipped — the gain is real, statistically robust, codegen-verified, and free. Bundle attempts h6-h11 all failed to push the cumulative over the 1.0% gate threshold, but h5 alone is worth shipping. Verdict + pr_body include: full hypothesis table, confirmed dead-ends (loop unrolls outside h5 case, SSA-hoist, lambda fusion, x2 batching, prefetch, rayon pairing — all LLVM-neutralized), and recommendation for follow-on profiling post-h5.
Post pw4 audit: baseline drift inflated per-iter deltas by 5-10x due to position bias, low sample count, and page cache asymmetry. This commit addresses all identified noise sources: - N=1→N=3 default (12 samples/side, ~85% power vs ~40%) - Counterbalanced round ordering (odd=base→cand, even=cand→base) - Core pinning via taskset (one thread per physical core) - Page cache drop between sides (sync + drop_caches) - A-vs-A noise floor calibration (soft-fail, no loop disruption) - New eval_steady_state.sh for production-regime measurement (proof 200+) - Auto-chain steady-state on keeps exceeding 2% Tested on Hetzner AX42-U: noise floor 0.035%, counterbalancing visibly corrects position bias (round 2 Δ=-0.23% vs rounds 1,3 Δ=-0.48%). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Redesigned benchmarking to improve experiment gate.