ci: bench regression lane — ecsia/bitECS ratios under a ceiling by andymai · Pull Request #84 · andymai/ecsia

andymai · 2026-06-08T08:45:10Z

What

The guard for the perf program (#82 and the follow-ups). A dedicated CI job, noise-isolated from unit CI, that times each ecsia iteration path against a same-run bitECS control and asserts the ns/entity ratio stays under a committed ceiling.

Why ratios, not absolute ns: a shared GitHub runner is noisy in absolute terms, but it moves ecsia and bitECS together — the ratio is stable. So a failure means a genuine regression (codegen breaks → bindColumns deopts from ~0.72× to ~1.5×), not scheduling noise.

Gated behind BENCH_REGRESSION=1 → runs only in the dedicated bench-regression job, never in the default pnpm test (so timing noise can't flake unit CI; verified it skips by default).
Best-of-3 p50 per path at 50k entities.
Ceilings in bench/regression-baseline.json (ratchet down on durable wins): bindColumns 0.9, eachChunk 1.3, each 9.0 — vs measured 0.72 / 1.08 / 7.4.

Verification

Locally: passes flagged (3 paths under ceiling), skips unflagged. ~40s job runtime.

…ling A dedicated CI job (noise-isolated from unit CI) times each ecsia iteration path against a SAME-RUN bitECS control and asserts the ns/entity RATIO stays under a committed ceiling (bench/regression-baseline.json). The ratio cancels shared-runner drift — a noisy runner moves ecsia and bitECS together — so a failure is a real regression (e.g. codegen breaking and bindColumns deopting from ~0.72x to ~1.5x), not scheduling noise. The test is gated behind BENCH_REGRESSION=1 so it runs ONLY in its dedicated job, never in the default pnpm test (where timing noise would flake unit CI). Best-of-3 p50 per path. Ceilings ratchet down in the baseline when a path durably improves; today bindColumns 0.9, eachChunk 1.3, each 9.0 (measured 0.72 / 1.08 / 7.4).

greptile-apps · 2026-06-08T08:49:22Z

Greptile Summary

This PR adds a dedicated bench-regression CI job that times each ecsia iteration path against a same-run bitECS control and asserts the ns/entity ratio stays under a committed ceiling, using ratio-based comparison to cancel shared-runner noise.

New CI job (.github/workflows/ci.yml): isolated from unit CI, runs only when BENCH_REGRESSION=1, measures 3 paths (bindColumns, eachChunk, each) with best-of-3 p50 at 50 k entities.
New baseline file (bench/regression-baseline.json): stores ratio ceilings (0.9 / 1.3 / 9.0) ratcheted above the measured 0.72 / 1.08 / 7.4, with ~20-25% headroom.
Regression harness (bench/test/regression.bench.test.ts): the describe.skipIf(!ENABLED) guard does not prevent the describe body from executing during collection, so the bitECS measurement runs on every pnpm test invocation.

Confidence Score: 3/5

The CI job and baseline file are safe; the benchmark harness has a structural issue that causes expensive computation to leak into the default test run.

The bitECS control measurement runs at describe-body scope, so Vitest evaluates it during test collection on every pnpm test run, defeating the BENCH_REGRESSION flag meant to isolate the expensive computation to the dedicated CI job.

bench/test/regression.bench.test.ts needs the bitECS control moved into a beforeAll block to honour the skip guard.

Important Files Changed

Filename	Overview
bench/test/regression.bench.test.ts	New bench regression harness; the bitECS control measurement sits at describe-body scope so it executes during Vitest collection even when the suite is skipped, running on every `pnpm test` invocation despite the BENCH_REGRESSION flag.
.github/workflows/ci.yml	Adds a noise-isolated `bench-regression` CI job; structure mirrors existing jobs cleanly with a pinned pnpm action and explicit Node 24.
bench/regression-baseline.json	New ratio ceiling file; ceilings are generous enough (~20-25% headroom over measured values) to absorb runner noise without false positives.

Prompt To Fix All With AI

Fix the following 2 code review issues. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 2
bench/test/regression.bench.test.ts:49-51
**`bit` runs at describe-body scope, defeating the `BENCH_REGRESSION` flag**

`const bit = nsPerEntity(makeBitEcsIter)` sits in the `describe` callback body, which Vitest evaluates during test *collection* — even when the suite is marked skip via `describe.skipIf`. Because the `bench` project (`bench/test/**/*.test.ts`) is included in the shared `vitest.config.ts` and `pnpm test` runs all projects, this fires the full bitECS measurement (3 × 1 800 iterations × 50 k entities) on every ordinary `pnpm test -- --coverage` CI run, directly contradicting the stated intent. Moving it into a `beforeAll` callback is the standard fix: Vitest does not invoke `beforeAll` for skipped suites, so the work is fully suppressed when `ENABLED` is false.

### Issue 2 of 2
bench/test/regression.bench.test.ts:49-51
**`bit` and ecsia paths measured in different execution phases**

Even when `ENABLED=true`, `bit` is measured during the `describe` callback (collection phase) while the three ecsia paths are measured inside `test.each` (execution phase). On a noisy shared runner, the machine state can differ noticeably between these two phases — other background work, JIT warm-up of unrelated code, etc. — slightly undermining the "same-run same-conditions" goal. Moving `bit` into a `beforeAll` would keep all timing in the same execution phase.

_{Reviews (1): Last reviewed commit: "ci: bench regression lane — ecsia/bitECS..." | Re-trigger Greptile}

greptile-apps · 2026-06-08T08:49:26Z

+  // ONE bitECS control measured in the same process/run as the ecsia paths below.
+  const bit = nsPerEntity(makeBitEcsIter)
+


bit runs at describe-body scope, defeating the BENCH_REGRESSION flag

const bit = nsPerEntity(makeBitEcsIter) sits in the describe callback body, which Vitest evaluates during test collection — even when the suite is marked skip via describe.skipIf. Because the bench project (bench/test/**/*.test.ts) is included in the shared vitest.config.ts and pnpm test runs all projects, this fires the full bitECS measurement (3 × 1 800 iterations × 50 k entities) on every ordinary pnpm test -- --coverage CI run, directly contradicting the stated intent. Moving it into a beforeAll callback is the standard fix: Vitest does not invoke beforeAll for skipped suites, so the work is fully suppressed when ENABLED is false.

Prompt To Fix With AI

This is a comment left during a code review. Path: bench/test/regression.bench.test.ts Line: 49-51 Comment: **`bit` runs at describe-body scope, defeating the `BENCH_REGRESSION` flag** `const bit = nsPerEntity(makeBitEcsIter)` sits in the `describe` callback body, which Vitest evaluates during test *collection* — even when the suite is marked skip via `describe.skipIf`. Because the `bench` project (`bench/test/**/*.test.ts`) is included in the shared `vitest.config.ts` and `pnpm test` runs all projects, this fires the full bitECS measurement (3 × 1 800 iterations × 50 k entities) on every ordinary `pnpm test -- --coverage` CI run, directly contradicting the stated intent. Moving it into a `beforeAll` callback is the standard fix: Vitest does not invoke `beforeAll` for skipped suites, so the work is fully suppressed when `ENABLED` is false. How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2026-06-08T08:49:27Z

+  // ONE bitECS control measured in the same process/run as the ecsia paths below.
+  const bit = nsPerEntity(makeBitEcsIter)
+


bit and ecsia paths measured in different execution phases

Even when ENABLED=true, bit is measured during the describe callback (collection phase) while the three ecsia paths are measured inside test.each (execution phase). On a noisy shared runner, the machine state can differ noticeably between these two phases — other background work, JIT warm-up of unrelated code, etc. — slightly undermining the "same-run same-conditions" goal. Moving bit into a beforeAll would keep all timing in the same execution phase.

Prompt To Fix With AI

This is a comment left during a code review. Path: bench/test/regression.bench.test.ts Line: 49-51 Comment: **`bit` and ecsia paths measured in different execution phases** Even when `ENABLED=true`, `bit` is measured during the `describe` callback (collection phase) while the three ecsia paths are measured inside `test.each` (execution phase). On a noisy shared runner, the machine state can differ noticeably between these two phases — other background work, JIT warm-up of unrelated code, etc. — slightly undermining the "same-run same-conditions" goal. Moving `bit` into a `beforeAll` would keep all timing in the same execution phase. How can I resolve this? If you propose a fix, please make it concise.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

andymai enabled auto-merge (squash) June 8, 2026 08:45

andymai merged commit 36dd4e4 into main Jun 8, 2026
9 checks passed

andymai deleted the ci/bench-regression-lane branch June 8, 2026 08:46

andymai mentioned this pull request Jun 8, 2026

ci: bundle-size budget — tree-shaken min+gzip, CI-gated #85

Merged

greptile-apps Bot reviewed Jun 8, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ci: bench regression lane — ecsia/bitECS ratios under a ceiling#84

ci: bench regression lane — ecsia/bitECS ratios under a ceiling#84
andymai merged 1 commit into
mainfrom
ci/bench-regression-lane

andymai commented Jun 8, 2026

Uh oh!

Uh oh!

greptile-apps Bot commented Jun 8, 2026

Uh oh!

greptile-apps Bot Jun 8, 2026

Uh oh!

greptile-apps Bot Jun 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		// ONE bitECS control measured in the same process/run as the ecsia paths below.
		const bit = nsPerEntity(makeBitEcsIter)

Conversation

andymai commented Jun 8, 2026

What

Verification

Uh oh!

Uh oh!

greptile-apps Bot commented Jun 8, 2026

Greptile Summary

Confidence Score: 3/5

Important Files Changed

Uh oh!

greptile-apps Bot Jun 8, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Jun 8, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant