fix(ci): raise perf-gate noise floors to match observed runner variance#4867
Merged
Conversation
The v0.5.1151 release-mode Regression Check hard-failed twice on bench_string_heavy (+35%, +73%) while the push-mode run on the SAME commit reported 7 improvements and no regressions: the row measures 60/81/104ms across three same-commit macos-14 runs, so the 20ms absolute floor lets pure runner noise through on every sub-150ms benchmark. Raise MIN_SPEED_DELTA_MS to 100ms (every observed real regression — the v0.5.1129 hang was ~4000x — clears it by orders of magnitude; CPU-bound rows over ~300ms track within ~10% between runners) and MIN_RAM_DELTA_KB to 4MB (nested_loops RSS swings ~1.8MB / +51% between runs without any allocation change).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The v0.5.1151 release-mode Regression Check hard-failed twice on
bench_string_heavy(+73.3%: 60→104ms, then +35.0%: 60→81ms on re-run) while the push-mode run on the same commit reported 7 improvements, no regressions.Three same-commit macos-14 runs measured that row at 60 / 81 / 104 ms — the 20ms absolute noise floor (
MIN_SPEED_DELTA_MS) lets pure runner-to-runner variance hard-fail releases on any sub-150ms benchmark.MIN_SPEED_DELTA_MS: 20 → 100ms. Every real regression we'd want a release gate to catch clears this by orders of magnitude (the v0.5.1129 dense-array hang was ~4000x / +6h). CPU-bound rows over ~300ms tracked within ~10% across runners.MIN_RAM_DELTA_KB: 2MB → 4MB.10_nested_loopsRSS swings ±1.8MB (+51%) between same-commit runs with no allocation change.Follow-up to #4857 (which made the gate compare anything at all). Note: re-runs of the v0.5.1151 tag's Regression Check use the tag's checkout, so this only hardens future tags.