ci: bench regression lane — ecsia/bitECS ratios under a ceiling#84
Conversation
…ling A dedicated CI job (noise-isolated from unit CI) times each ecsia iteration path against a SAME-RUN bitECS control and asserts the ns/entity RATIO stays under a committed ceiling (bench/regression-baseline.json). The ratio cancels shared-runner drift — a noisy runner moves ecsia and bitECS together — so a failure is a real regression (e.g. codegen breaking and bindColumns deopting from ~0.72x to ~1.5x), not scheduling noise. The test is gated behind BENCH_REGRESSION=1 so it runs ONLY in its dedicated job, never in the default pnpm test (where timing noise would flake unit CI). Best-of-3 p50 per path. Ceilings ratchet down in the baseline when a path durably improves; today bindColumns 0.9, eachChunk 1.3, each 9.0 (measured 0.72 / 1.08 / 7.4).
Greptile SummaryThis PR adds a dedicated
Confidence Score: 3/5The CI job and baseline file are safe; the benchmark harness has a structural issue that causes expensive computation to leak into the default test run. The bitECS control measurement runs at describe-body scope, so Vitest evaluates it during test collection on every bench/test/regression.bench.test.ts needs the bitECS control moved into a Important Files Changed
Prompt To Fix All With AIFix the following 2 code review issues. Work through them one at a time, proposing concise fixes.
---
### Issue 1 of 2
bench/test/regression.bench.test.ts:49-51
**`bit` runs at describe-body scope, defeating the `BENCH_REGRESSION` flag**
`const bit = nsPerEntity(makeBitEcsIter)` sits in the `describe` callback body, which Vitest evaluates during test *collection* — even when the suite is marked skip via `describe.skipIf`. Because the `bench` project (`bench/test/**/*.test.ts`) is included in the shared `vitest.config.ts` and `pnpm test` runs all projects, this fires the full bitECS measurement (3 × 1 800 iterations × 50 k entities) on every ordinary `pnpm test -- --coverage` CI run, directly contradicting the stated intent. Moving it into a `beforeAll` callback is the standard fix: Vitest does not invoke `beforeAll` for skipped suites, so the work is fully suppressed when `ENABLED` is false.
### Issue 2 of 2
bench/test/regression.bench.test.ts:49-51
**`bit` and ecsia paths measured in different execution phases**
Even when `ENABLED=true`, `bit` is measured during the `describe` callback (collection phase) while the three ecsia paths are measured inside `test.each` (execution phase). On a noisy shared runner, the machine state can differ noticeably between these two phases — other background work, JIT warm-up of unrelated code, etc. — slightly undermining the "same-run same-conditions" goal. Moving `bit` into a `beforeAll` would keep all timing in the same execution phase.
Reviews (1): Last reviewed commit: "ci: bench regression lane — ecsia/bitECS..." | Re-trigger Greptile |
| // ONE bitECS control measured in the same process/run as the ecsia paths below. | ||
| const bit = nsPerEntity(makeBitEcsIter) | ||
|
|
There was a problem hiding this comment.
bit runs at describe-body scope, defeating the BENCH_REGRESSION flag
const bit = nsPerEntity(makeBitEcsIter) sits in the describe callback body, which Vitest evaluates during test collection — even when the suite is marked skip via describe.skipIf. Because the bench project (bench/test/**/*.test.ts) is included in the shared vitest.config.ts and pnpm test runs all projects, this fires the full bitECS measurement (3 × 1 800 iterations × 50 k entities) on every ordinary pnpm test -- --coverage CI run, directly contradicting the stated intent. Moving it into a beforeAll callback is the standard fix: Vitest does not invoke beforeAll for skipped suites, so the work is fully suppressed when ENABLED is false.
Prompt To Fix With AI
This is a comment left during a code review.
Path: bench/test/regression.bench.test.ts
Line: 49-51
Comment:
**`bit` runs at describe-body scope, defeating the `BENCH_REGRESSION` flag**
`const bit = nsPerEntity(makeBitEcsIter)` sits in the `describe` callback body, which Vitest evaluates during test *collection* — even when the suite is marked skip via `describe.skipIf`. Because the `bench` project (`bench/test/**/*.test.ts`) is included in the shared `vitest.config.ts` and `pnpm test` runs all projects, this fires the full bitECS measurement (3 × 1 800 iterations × 50 k entities) on every ordinary `pnpm test -- --coverage` CI run, directly contradicting the stated intent. Moving it into a `beforeAll` callback is the standard fix: Vitest does not invoke `beforeAll` for skipped suites, so the work is fully suppressed when `ENABLED` is false.
How can I resolve this? If you propose a fix, please make it concise.| // ONE bitECS control measured in the same process/run as the ecsia paths below. | ||
| const bit = nsPerEntity(makeBitEcsIter) | ||
|
|
There was a problem hiding this comment.
bit and ecsia paths measured in different execution phases
Even when ENABLED=true, bit is measured during the describe callback (collection phase) while the three ecsia paths are measured inside test.each (execution phase). On a noisy shared runner, the machine state can differ noticeably between these two phases — other background work, JIT warm-up of unrelated code, etc. — slightly undermining the "same-run same-conditions" goal. Moving bit into a beforeAll would keep all timing in the same execution phase.
Prompt To Fix With AI
This is a comment left during a code review.
Path: bench/test/regression.bench.test.ts
Line: 49-51
Comment:
**`bit` and ecsia paths measured in different execution phases**
Even when `ENABLED=true`, `bit` is measured during the `describe` callback (collection phase) while the three ecsia paths are measured inside `test.each` (execution phase). On a noisy shared runner, the machine state can differ noticeably between these two phases — other background work, JIT warm-up of unrelated code, etc. — slightly undermining the "same-run same-conditions" goal. Moving `bit` into a `beforeAll` would keep all timing in the same execution phase.
How can I resolve this? If you propose a fix, please make it concise.Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
What
The guard for the perf program (#82 and the follow-ups). A dedicated CI job, noise-isolated from unit CI, that times each ecsia iteration path against a same-run bitECS control and asserts the ns/entity ratio stays under a committed ceiling.
Why ratios, not absolute ns: a shared GitHub runner is noisy in absolute terms, but it moves ecsia and bitECS together — the ratio is stable. So a failure means a genuine regression (codegen breaks →
bindColumnsdeopts from ~0.72× to ~1.5×), not scheduling noise.BENCH_REGRESSION=1→ runs only in the dedicatedbench-regressionjob, never in the defaultpnpm test(so timing noise can't flake unit CI; verified it skips by default).bench/regression-baseline.json(ratchet down on durable wins):bindColumns0.9,eachChunk1.3,each9.0 — vs measured 0.72 / 1.08 / 7.4.Verification
Locally: passes flagged (3 paths under ceiling), skips unflagged. ~40s job runtime.