Skip to content

ci: bench regression lane — ecsia/bitECS ratios under a ceiling#84

Merged
andymai merged 1 commit into
mainfrom
ci/bench-regression-lane
Jun 8, 2026
Merged

ci: bench regression lane — ecsia/bitECS ratios under a ceiling#84
andymai merged 1 commit into
mainfrom
ci/bench-regression-lane

Conversation

@andymai
Copy link
Copy Markdown
Owner

@andymai andymai commented Jun 8, 2026

What

The guard for the perf program (#82 and the follow-ups). A dedicated CI job, noise-isolated from unit CI, that times each ecsia iteration path against a same-run bitECS control and asserts the ns/entity ratio stays under a committed ceiling.

Why ratios, not absolute ns: a shared GitHub runner is noisy in absolute terms, but it moves ecsia and bitECS together — the ratio is stable. So a failure means a genuine regression (codegen breaks → bindColumns deopts from ~0.72× to ~1.5×), not scheduling noise.

  • Gated behind BENCH_REGRESSION=1 → runs only in the dedicated bench-regression job, never in the default pnpm test (so timing noise can't flake unit CI; verified it skips by default).
  • Best-of-3 p50 per path at 50k entities.
  • Ceilings in bench/regression-baseline.json (ratchet down on durable wins): bindColumns 0.9, eachChunk 1.3, each 9.0 — vs measured 0.72 / 1.08 / 7.4.

Verification

Locally: passes flagged (3 paths under ceiling), skips unflagged. ~40s job runtime.

…ling

A dedicated CI job (noise-isolated from unit CI) times each ecsia
iteration path against a SAME-RUN bitECS control and asserts the
ns/entity RATIO stays under a committed ceiling
(bench/regression-baseline.json). The ratio cancels shared-runner drift
— a noisy runner moves ecsia and bitECS together — so a failure is a
real regression (e.g. codegen breaking and bindColumns deopting from
~0.72x to ~1.5x), not scheduling noise.

The test is gated behind BENCH_REGRESSION=1 so it runs ONLY in its
dedicated job, never in the default pnpm test (where timing noise would
flake unit CI). Best-of-3 p50 per path. Ceilings ratchet down in the
baseline when a path durably improves; today bindColumns 0.9, eachChunk
1.3, each 9.0 (measured 0.72 / 1.08 / 7.4).
@andymai andymai enabled auto-merge (squash) June 8, 2026 08:45
@andymai andymai merged commit 36dd4e4 into main Jun 8, 2026
9 checks passed
@andymai andymai deleted the ci/bench-regression-lane branch June 8, 2026 08:46
@greptile-apps
Copy link
Copy Markdown

greptile-apps Bot commented Jun 8, 2026

Greptile Summary

This PR adds a dedicated bench-regression CI job that times each ecsia iteration path against a same-run bitECS control and asserts the ns/entity ratio stays under a committed ceiling, using ratio-based comparison to cancel shared-runner noise.

  • New CI job (.github/workflows/ci.yml): isolated from unit CI, runs only when BENCH_REGRESSION=1, measures 3 paths (bindColumns, eachChunk, each) with best-of-3 p50 at 50 k entities.
  • New baseline file (bench/regression-baseline.json): stores ratio ceilings (0.9 / 1.3 / 9.0) ratcheted above the measured 0.72 / 1.08 / 7.4, with ~20-25% headroom.
  • Regression harness (bench/test/regression.bench.test.ts): the describe.skipIf(!ENABLED) guard does not prevent the describe body from executing during collection, so the bitECS measurement runs on every pnpm test invocation.

Confidence Score: 3/5

The CI job and baseline file are safe; the benchmark harness has a structural issue that causes expensive computation to leak into the default test run.

The bitECS control measurement runs at describe-body scope, so Vitest evaluates it during test collection on every pnpm test run, defeating the BENCH_REGRESSION flag meant to isolate the expensive computation to the dedicated CI job.

bench/test/regression.bench.test.ts needs the bitECS control moved into a beforeAll block to honour the skip guard.

Important Files Changed

Filename Overview
bench/test/regression.bench.test.ts New bench regression harness; the bitECS control measurement sits at describe-body scope so it executes during Vitest collection even when the suite is skipped, running on every pnpm test invocation despite the BENCH_REGRESSION flag.
.github/workflows/ci.yml Adds a noise-isolated bench-regression CI job; structure mirrors existing jobs cleanly with a pinned pnpm action and explicit Node 24.
bench/regression-baseline.json New ratio ceiling file; ceilings are generous enough (~20-25% headroom over measured values) to absorb runner noise without false positives.

Fix All in Claude Code

Prompt To Fix All With AI
Fix the following 2 code review issues. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 2
bench/test/regression.bench.test.ts:49-51
**`bit` runs at describe-body scope, defeating the `BENCH_REGRESSION` flag**

`const bit = nsPerEntity(makeBitEcsIter)` sits in the `describe` callback body, which Vitest evaluates during test *collection* — even when the suite is marked skip via `describe.skipIf`. Because the `bench` project (`bench/test/**/*.test.ts`) is included in the shared `vitest.config.ts` and `pnpm test` runs all projects, this fires the full bitECS measurement (3 × 1 800 iterations × 50 k entities) on every ordinary `pnpm test -- --coverage` CI run, directly contradicting the stated intent. Moving it into a `beforeAll` callback is the standard fix: Vitest does not invoke `beforeAll` for skipped suites, so the work is fully suppressed when `ENABLED` is false.

### Issue 2 of 2
bench/test/regression.bench.test.ts:49-51
**`bit` and ecsia paths measured in different execution phases**

Even when `ENABLED=true`, `bit` is measured during the `describe` callback (collection phase) while the three ecsia paths are measured inside `test.each` (execution phase). On a noisy shared runner, the machine state can differ noticeably between these two phases — other background work, JIT warm-up of unrelated code, etc. — slightly undermining the "same-run same-conditions" goal. Moving `bit` into a `beforeAll` would keep all timing in the same execution phase.

Reviews (1): Last reviewed commit: "ci: bench regression lane — ecsia/bitECS..." | Re-trigger Greptile

Comment on lines +49 to +51
// ONE bitECS control measured in the same process/run as the ecsia paths below.
const bit = nsPerEntity(makeBitEcsIter)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 bit runs at describe-body scope, defeating the BENCH_REGRESSION flag

const bit = nsPerEntity(makeBitEcsIter) sits in the describe callback body, which Vitest evaluates during test collection — even when the suite is marked skip via describe.skipIf. Because the bench project (bench/test/**/*.test.ts) is included in the shared vitest.config.ts and pnpm test runs all projects, this fires the full bitECS measurement (3 × 1 800 iterations × 50 k entities) on every ordinary pnpm test -- --coverage CI run, directly contradicting the stated intent. Moving it into a beforeAll callback is the standard fix: Vitest does not invoke beforeAll for skipped suites, so the work is fully suppressed when ENABLED is false.

Prompt To Fix With AI
This is a comment left during a code review.
Path: bench/test/regression.bench.test.ts
Line: 49-51

Comment:
**`bit` runs at describe-body scope, defeating the `BENCH_REGRESSION` flag**

`const bit = nsPerEntity(makeBitEcsIter)` sits in the `describe` callback body, which Vitest evaluates during test *collection* — even when the suite is marked skip via `describe.skipIf`. Because the `bench` project (`bench/test/**/*.test.ts`) is included in the shared `vitest.config.ts` and `pnpm test` runs all projects, this fires the full bitECS measurement (3 × 1 800 iterations × 50 k entities) on every ordinary `pnpm test -- --coverage` CI run, directly contradicting the stated intent. Moving it into a `beforeAll` callback is the standard fix: Vitest does not invoke `beforeAll` for skipped suites, so the work is fully suppressed when `ENABLED` is false.

How can I resolve this? If you propose a fix, please make it concise.

Fix in Claude Code

Comment on lines +49 to +51
// ONE bitECS control measured in the same process/run as the ecsia paths below.
const bit = nsPerEntity(makeBitEcsIter)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 bit and ecsia paths measured in different execution phases

Even when ENABLED=true, bit is measured during the describe callback (collection phase) while the three ecsia paths are measured inside test.each (execution phase). On a noisy shared runner, the machine state can differ noticeably between these two phases — other background work, JIT warm-up of unrelated code, etc. — slightly undermining the "same-run same-conditions" goal. Moving bit into a beforeAll would keep all timing in the same execution phase.

Prompt To Fix With AI
This is a comment left during a code review.
Path: bench/test/regression.bench.test.ts
Line: 49-51

Comment:
**`bit` and ecsia paths measured in different execution phases**

Even when `ENABLED=true`, `bit` is measured during the `describe` callback (collection phase) while the three ecsia paths are measured inside `test.each` (execution phase). On a noisy shared runner, the machine state can differ noticeably between these two phases — other background work, JIT warm-up of unrelated code, etc. — slightly undermining the "same-run same-conditions" goal. Moving `bit` into a `beforeAll` would keep all timing in the same execution phase.

How can I resolve this? If you propose a fix, please make it concise.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Fix in Claude Code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant